WORDLIST - Editorial

Problem Link - Word List

Problem Statement

In this problem the input will consist of a number of lines of English text consisting of the letters of the English alphabet, the punctuation marks ’ (apostrophe), . (full stop), , (comma), ; (semicolon), :(colon) and white space characters (blank, newline).

Your task is print the words in the text in lexicographic order (that is, dictionary order). Each word should appear exactly once in your list. You can ignore the case (for instance, “The” and “the” are to be treated as the same word). There should be no uppercase letters in the output.

For example, consider the following candidate for the input text:

This is a sample piece of text to illustrate this 
problem.

The corresponding output would read as:

a
illustrate
is
of
piece
problem
sample
text
this
to

Approach

The idea behind solving this problem is to process the input text in a way that we can easily extract and sort the words. First, we want to make the comparison of words case-insensitive, so we convert the entire text to lowercase. Then, we remove any unwanted punctuation marks from the text so that we are only left with valid words. After removing the punctuation, we split the text into individual words. Since there could be duplicate words, we use a set to automatically eliminate any repetitions. Finally, we sort the words in lexicographic order and print them one by one. The output must be the unique words, sorted, and printed in lowercase without any punctuation.

Time Complexity

The time complexity is O(k \log k), where k is the number of unique words, because sorting the words is the most time-consuming part.

Space Complexity

The space complexity is O(k), where k is the number of unique words stored in memory.