GERALD3 - Editorial



Author: Gerald Agapov
Tester: Tasnim Imran Sunny
Editorialist: Jingbo Shang




Suffix Array, Longest Common Prefix, Balanced Binary Search Tree,


Given a string S[1…N], answer M queries. The i-th query is to find out how many different substrings are there in all substrings of S starting from position Li and Ri.


First of all, let’s solve this easier problem: How many distinct substrings are there in all substrings of S? It is a classical problem, such as SPOJ 705. New Distinct Substrings. The solution is to firstly sort all suffixes of S, denoted as SA[1…N] (the element is the strings here, but you can store their starting position). And then, calculate the longest common prefix (LCP) between SA[i] and SA[i+1], denoted as LCP[i]. Finally, the answer is N(N+1)/2 - LCP1 - LCP2 - … - LCP[N-1]. To do this process efficiently and conveniently, we can ask Suffix Array for help. There are a lot of algorithms to build the Suffix Array and calculate the LCP. It can be done in O(N) time. Also, you can try Suffix Tree or Suffix Automaton.

One more useful property here is that:

LCP(SA[L], SA[R]) = min{LCP[L], LCP[L+1], ... LCP[R-1]}

which will be used later. Note that we can tackle this RMQ problem with some preparation such that only O(1) time is needed for each query. (e.g. via ST-table, O(NlogN) preparation)

Secondly, this problem will be much easier, if there do not exist a query interval which is totally contained by another. With this special property, we can sort them by Li ascending, and thus their Ris should be ascending too. The key point is we can use an offline algorithm. We can firstly build the SA[] and get the LCP[] as same as the first problem. Also, we need to prepare for the RMQ problem. And then, maintain two pointers (for L and R) and move them according to the queries we have (the move is always from the star to the end.). Meanwhile, maintain a data structure (such as balanced binary search tree. You can use the set in C++ STL too) to keep the order of suffixes in the query range. After each insert/remove, we can know the sum of the LCPs of the currently adjacent suffixes too. Therefore, we are able to answer all queries after the travel of all queries. The time complexity is O(MlogN).

Finally, we can face the original problems. The basic idea is similar to the problem Powerful array in Codeforces. That is firstly dividing the 1…N into B blocks and then sorting all queries by the index of the block that contained Li. Use Ri as the tie breaker, i.e. the queries within the same starting block are sorted by their Ri. They, still apply the similar solution in the second problem: two pointers and balanced binary search tree. But in this case, the pointer may trace back. But, fortunately, the left pointer will always within the block if trace back while the right pointer will trace back at most B times. Therefore, the total time of the moving of pointers can be controlled by M * N / B + B * N. If we set B = sqrt(N), then the total complexity will be O(max(N, M)sqrt(N)logN).


Author’s solution can be found here.
Tester’s solution can be found here.

1 Like

Now that’s what I call an awesome editorial. Ideas to solving the current problem and providing links and discussing almost similar problems. Helps one to level up immensely. Hats off :slight_smile:


Please help me with my code. I tried a different logic.
my code:

i have stored all possible substrings from a given Li to Ri in an array and then eliminated the duplicate ones.

what was the required complexity for suffix array construction??

would O(N*(logN)^2) work here??

Of course it will work. The total complexity of colution is O(N*sqrt(N)log(N)), but O(Nlog^2(N)) is much smaller.

@ballon_ziq thanks :slight_smile:

i am getting SIGSEGV error!!