TMP01 - Editorial

Problem Link:

Practice

Contest

Difficulty:

Hard

Pre-requisites:

Suffix-trees

Problem:

Start with an empty string, and do the following operations:

  • Append a character to the end of string.
  • Remove the first character of the string.
    After each operation, report the number of distinct substrings in the string.

Explanation:

This was easily the hardest problem. The setters solution was based on Ukkonen’s Algorithm‎. The time complexity is O(Q).

Ukkonen’s algorithm has been described very neatly on stackoverflow. It has also been described in short in these slides. I would recommend reading the stackoverflow post(if possible, the slides as well) before we continue. Here is link to the published paper for the enthusiasts. It can also be found on e-maxx.ru.

The sum of length of all edges of the suffix tree is equal to the number of distinct substrings in the string. So, If we had to merely append characters to the end of string, it can be easily done with Ukkonen itself, because we could maintain sum of length of all edges. We also need to maintain the number of leaves, because the length of each leaf edge increases by one after each step. Length of remaining edges remain same, unless the edge has been split in previous step.

Ukkonen’s algorithm can be modified to handle deletions as well, and its not too hard to figure out. Here are the details.

The first step is to realize that each leaf corresponds to a unique suffix of the string, and we can do bookkeeping to maintain the leaves for each suffix. Only those suffixes whose length is equal to remainder(refer to stackoverflow article) or less have no leaves corresponding to them. The algorithm will automatically ensure that we remove only those suffixes for which a leaf exists.

Due to bookkeeping, we can get hold of the leaf corresponding to the current string. We only need to remove this leaf. We also might need to remove some internal edges that were being used only by the suffix we are removing. To see this, consider the suffix tree for string “aaac”.

Suppose we remove the first ‘a’ from string now. We would just need to remove corresponding leaf(i.e. node 3). Now suppose we were to remove the second ‘a’ as well. Then we would need to remove vertex 4, as well as vertex 2, because the 1-2 edge is being used only by this suffix. In general, when removing a leaf, keep removing its parent as well if

  • The parent has out degree 0.
  • And it is not the active node.

To see the reason for point 2 above, consider the string “aaacaac”. The active node after adding last ‘c’ will be node 2. Now lets say, you remove the first ‘a’ from the beginning. After that your suffix tree looks like:

Now you also want to delete the second ‘a’. However, you cannot remove node 2, because the node 2 may have outdegree 0, but it is being used by the active suffix. After removal of node 4, our suffix tree looks as follows.

This example also illustrates that after removing the front character, the active point itself can become a leaf. This can be identified as active edge child of active node becoming null. In this case, you will need to establish the current active point as a leaf, and move to suffix link of active node. This will ensure a condition we had in the beginning, that “The algorithm will automatically ensure that we remove only those suffixes for which a leaf exists”. This is all the high level details you need to know. Read the solutions below for details. Some of the top solutions are very neat as well and can be read.

Acrush and few others used suffix arrays and LCP array with high large amount of optimizations to pass the time limit.

Setter’s Solution:

Can be found here

Tester’s Solution:

Can be found here

Editorialist’s Solution:

Can be found here

8 Likes

We can solve this problem by Suffix Array in O(nlogn) time.

  • Read all the qureys and ignore all the ‘-’ to get the whole string, reverse it and build SA
  • Query ‘+’ stands for Add a Suffix longer than all the existing ones
  • Query ‘-’ stands for delete the last char in all the suffix

We create a seg-tree to maintain the suffix, rules are:

  • suffixes are ordered the same as they are ordered in SA
  • No suffix is the prefix of other suffixes
  • suppose a suffix CUR, and its previous suffix PRE, we maintain Len(CUR) - LCP(CUR, PRE) for CUR, we call it RemL(CUR)
  • suppose a suffix CUR, and its next suffix NXT, we maintain Len(CUR) - LCP(CUR, NXT) for CUR, we call it RemR(CUR)

Let’s talk about the two operations

  • When we add a suffix CUR, check its previous suffix and the next suffix if PRE or NXT is the prefix of CUR. If it does, delete it
  • When deleting char from all the suffix, we obtain that all the RemL and RemR are reduced by 1.Check if any of them reduce to 0 and delete that suffix
  • at any time Ans are the sum of all the RemL. Just maitain it in the seg tree.
  • use seg-tree to get PRE, NXT, LCP and do each insert or reduce operation in O(logn) time.

Then the problem is solved.
A good implementation is needed to avoid TLE.

8 Likes

can someone describe the suffix array solution, i’m having trouble trying to understand ACRush solution, does it suppose to work in (n^2) at worst case ?

1 Like

I got AC this way. CodeChef: Practical coding for everyone

This is essentially my solution, too. But I needed to obtain the LCP of two suffixes in O(1) time (by using O(1) RMQ queries) in order to avoid TLE.

At the beginning of the editorial the difficulty of this problem is written as “Medium”, instead of “Hard” (as its label suggests, and as it is obvious to most of us). Is this a mistake or is there any other reason for considering it of “Medium” difficulty?

2 Likes

Well, fixed now.

@mugurelionut i’m also thinking of the same thing.

@mugurelionut I used seg-tree to perform rmq query and also got AC. The slowest part is maintain RemL, RemR, sum and cnt while inserting or deleting a suffix. I try to reduce “updates per operation” and got AC.

how can this problem be solved by Suffix Automaton?

Could someone provide sample input and output so we can test our code for correctness?