WORDLIST IARCS

sandy999 · November 18, 2014, 9:39pm

This is a basic question, considering the fact that it is in the Basic Division of IARCS. Yet, I remain pretty confused. I just tried this problem after attempting to improve my understanding of C++ strings and their functions yesterday.

Since it isn’t nice to post a question without a description of what I tried, I have described my silly approach.

Problem Statement: http://opc.iarcs.org.in/index.php/problems/WORDLIST

My code: https://www.dropbox.com/s/t7yffe6uwfqen0w/WORDLIST%20Code.docx?dl=0

I understand that these are a few of the major reasons behind my program failing

If a word is present more than once in the passage, it repeats in my output which is not supposed to happen according to the question
When a punctuation along with a blank space is encountered which is usually the case, I jump 2 or more words in my Word array, which thus results in blank words in my Word array being counted
I assume that the passage will end with a full stop (or any other punctuation)

Which only explains that my so called method is COMPLETELY flawed.

What I think I should consider

They have specified the max number of lines and max number of characters in a line. Maybe my logic should center around that. But I am not sure how.

There probably is a better way to put the words from the passage into my Words array, instead of the unorganized character by character method (I learnt from a previous post how to do so) which I have employed.

This is what I have tried and thought over so far. But I don’t have enough knowledge to rectify my errors and think of a better algorithm. Could someone please help me out?

P.S. Please let me know if the code link doesn’t work. I thought a link would make the post look cleaner.

virresh · December 1, 2014, 8:58pm

I also tried to solve the same problem and the code successfully runs on the provided sample input and some custom inputs too but gives a segmentation error on the judge.

Here’s a link to my code :
https://www.dropbox.com/s/5qq10t9kygwdqvi/wordlist.cpp?dl=0

What I have done :

Stored the word’s starting index in an array.
Sorted my array so as to arrange the indexes in a way that the letters occurring at that position in my original array are in alphabetical order. eg:-

original array : a test sentence.

index array : 0 2 7

after sorting: 0 7 2

Print elements of my original array as per my sorted index array

After spending some time I failed to identify my errors.
Any corner cases pointed out are welcomed.

sandy999 · December 1, 2014, 10:21pm

@virresh

Your input method was an eye opener for me…I suppose it’s a very basic concept or something, but if you don’t mind, could you briefly explain how you took the input? Especially the nature of your ‘te’ variable? The type of ‘te’ and the various operations you used on the ‘te’ variable were quite new to me.

What is the advantage of declaring te = ’ ’ in line 4? Does the compiler automatically consider it as a single character? Same way, why do you write a[ap]=’ ’ in line 17?

Same way in line 20, I don’t understand what exactly is happening. Sorry that I am such a dimwit.

And you can download some sample test data for this problem from the IARCS site itself here Indian National Olympiad in Informatics - Online Programming Contest - IARCS

virresh · December 3, 2014, 11:54am

@sandy999

Actually I am exploiting the fact that the memory implementation of the char data type is in the form of integers (their ascii values). For taking input character by character I use cin.get function. With a normal cin any white-space characters would get ignored but not with get.
My te (short for temporary) is a character variable that takes in a single character at a time and checks it for the value it carries. If it’s a character I store it and otherwise I discard it (that’s why any punctuation simply is ignored).

I declared te initially as a not so frequently used character ~ else it would carry a junk value. Even if it hadn’t been that way, any junk value wouldn’t matter as long as it doesn’t results into a ‘\n’ character for that would interfere with my while loop.

For the line 17 and 20 and many others assigning a character spaces simply refers to the ascii value 13 (the space character). I didn’t use 13 or any other ascii value directly because compiler automaticaly does that when we use Single Quotes with a Single Character in it.

After seeing the test data I realized that I hadn’t accounted for the enormously huge data size and a few repetition cases. After some debugging the new and working code with few substantial changes (although has a TLE on last test case) is

here : http://ideone.com/cQRIV5

Sorry for my hard to understand notations for variables and I hope that you can optimise it still to take care of the time limit.
Rest all implementations remain the same and

PS: The header file cstdio was included for debugging purposes only.

batcrazt · December 19, 2014, 9:30pm

Hey! I am little late, but I think I can help you with this.

To avoid repetition, you can search the array and make sure that you are not adding words that are already present. You can do this by using STL ‘find’ or just by binary search.
You are trying to split the words using spaces and punctuations. Instead, you can split the words when the character falls outside the range 97 to 122 (‘a’ to ‘z’). This will make the implementation more straight forward.

You can take a look at my code if you like:dCxqr3 - Online C++ Compiler & Debugging Tool - Ideone.com. It passes all the test cases. It is rather poorly written, so if you have any questions you can just ask me. Also, if there are ways I can improve my code let me know!

batcrazt · December 21, 2014, 1:56pm

@sandy999 I used getline to get the whole line as input and processed each line individually. I converted all the characters to lower case, and then I found out where each word starts and ends. Basically, the word starts at n if line[n-1] is outside the range and line[n] is within the range, and the word ends at m if line[m] is within the range and line[m+1] is outside the range. By keeping track of n and m I used substr function to generate the word(http://www.cplusplus.com/reference/string/string/substr/). I added the word to a vector, if the word isn’t already present and then sorted it using the inbuilt STL sort. I’ll try to share my code again, not sure what’s wrong.
Also, you can input one character at a time too, but since there are white spaces and multiple lines get quite tricky. I think taking the whole line as a input (using getline) works better. Just my opinion though.

sandy999 · December 3, 2014, 7:33pm

Thank you so much @virresh for your wonderful explanation. I never thought of bringing in ascii values. I’ll try the problem again soon and let you know my progress.

sandy999 · December 21, 2014, 12:01am

Hey @batcrazt I am still a little confused about the way I should take the input. I am attempting to take input one character at at time and add it to the word array if it is in the 97-122 range. If it is a newline, I increment my ‘number of words’ index and continue. But which input function (like cin.get, gets, getline etc) should I use to accept and check every character that way?

If it is too basic, could you please direct to me a resource which has more about manipulating string functions? And btw, your code isn’t working

Thanks a lot!

sandy999 · December 21, 2014, 2:20pm

Thank you soo much @batcrazt! Your manipulation of the string functions is brilliant! I have always been a tad confused about strings as I had never solved a proper competitive string problem before and only today I began learning vectors. Btw, I managed to open your code. The full stop at the end of it was also hyperlinked and was preventing it from opening. So, this shall be my first strings and vectors problem! Thanks again!

sandy999 · December 21, 2014, 2:43pm

Sorry for bothering you…another question. The cin.ignore() at the end of your input of number of lines, I tried removing it and it wasn’t taking the input of text for more than 1 line. What exactly is the role of the cin.ignore()? I googled it up but was unable to figure out its role in this context.

batcrazt · December 21, 2014, 4:01pm

Getline works by extracting characters until it detects a newline character("\n"). When cin is used, the user presses ‘enter’ ,after the input, adding a “\n”. If getline is used immediately after cin, getline will exit as soon as it detects that newline character. Using cin.ignore() after cin and before getline will help overcome that problem.
If cin.ignore() is not used in the case of my program, then the first line won’t be read.

sandy999 · December 21, 2014, 4:26pm

OMG! Finally I understood! Never knew vectors would be so useful here. Thank you