PROBLEM LINK:Author: Anudeep Nekkanti DIFFICULTY:MEDIUM HARD PREREQUISITES:Suffix Array, Suffix Tree, dfs, Segment tree, Fenwick Tree or BIT. PROBLEM:Given a string S and Q queries. In each query you are given an integer F. You need to find out number of substrings of S which occur at least F times in S. QUICK EXPLANATIONSuffix Array solution: Now the LCP array can be seen as a histogram with each bar having height LCP[i]. Then for some range [i,j] if minimum of [LCP[i], LCP[i+1].. LCP[j]] is m, then it means that there is a substring of length m, which is present at ji+2 indices in string S. Using this property we can find the solution in following way. Let D[i] represent the number of substrings which repeat exactly i times. If we can compute the array D, we can easily answer all the queries. For a given frequency F, required answer will be D[F] + D[F+1] + ... D[length of S]. For a bar of height H (LCP[i] = H), let us assume that we know the largest interval (L[i], R[i]) such that minimum of LCP array over the interval is equal to H, then it means that we have got a substring of length H, which repeats exactly R[i]L[i]+2 times. (See examples given in the explanation). Suffix Tree Solution EXPLANATIONFirst we will explain more about significance of array D and how it helps to solve the problem. Then the next section will elaborate on different ways of computing L[i] and R[i] for a given histogram H. Let us take an example string and create its suffix array. Consider the string S = "ABABBAABB".
Now A appears 4 times in S. A is the prefix of first 4 suffixes in the suffix array. Minimum of LCP[0], LCP[1], LCP[2] (0 based indexing) is 1 which represents that A has appeared 4 times. As pointed out in the quick explanation section, we can express LCP array as histogram with following values [1, 2, 3, 0, 1, 2, 1, 2]. You can also very easily verify the fact that if for some range[i, j], minimum of LCP array is m, then their exists a substring of length m occurring at j  i + 2 positions in the string S. L[i] for given LCP will be [0, 1, 2, 0, 4, 5, 4, 7]. Now let us find out how to update D using LCP. As we know that for the range [L[i], R[i]], minimum value of LCP array is equal to V. eg. Let us suppose we have following suffixes in the sorted order. When we are iterating over V = 2, Assume we are at position i = 0. V = 2 and R[i]  L[i] + 2 = 3. We will add 2 * 3 = 6 into D[3]. In other words it refers to adding substrings A and AA 3 times. The above mentioned approach has some small pitfalls which need to be taken care of. We are doing some overcounting in the approach. We need to fix that up. There are two kind of issues with the above approach which are explained in detail here. Issue 1 LCP = [1, 1, 3, 0, 2] L = [0, 0, 2, 0, 4] R = [2, 2, 2, 4, 4] When we are iterating over LCP array, Let us assume that current V = 1, If we are at i = 0 (ie at suffix A) we will count that A has occurred exactly 4 times (because R[0]  L[0] + 2 = 4, LCP[0] = 1). But when are considering V = 2, If we are at position i = 2 (ie we are at suffix ABAA), we have R[2] = 2, L[2] = 2. We will increment the count of D[(R[2]  L[2] + 2)] ie D[2] by 2, (we are updating count of occurrences of substrings A and AB by 2 more). But this is not correct, because in this approach we have counted A 6 times, which is wrong (You can easily see that A appears exactly 4 times). So There is over counting in our current method. We need to somehow get rid of this. Note that at suffix ABAA, the suffix just preceding it is AA. For V = 2 and suffix ABAA, when we count for A we have already counted the fact that substring A has appeared four times when the V = 1. So we can not say that all prefixes of length LCP[2] are counted exactly once. Notice that LCP[L[i]  1] is 1 and hence we are going to recount just the first prefix of suffix ABAA (ie A). So instead of directly saying that there are exactly LCP[i] substrings which appear exactly R[i]  L[i] + 2 times, we will say that there are exactly min(LCP[i], LCP[i]  LCP[L[i]  1], LCP[i]  LCP[R[i] + 1]) unique substrings appearing exactly R[i]  L[i] + 2 times. If you have not understood this fact, please take more examples and convince yourself. Issue 2 Our LCP array will be [1, 1]. As said earlier, we will iterating over LCP and our current V = 1.Assume that we are currently at the suffix A (ie i = 0), we have R[i] = 1.
We will update D[3] by 1 * 3. We can implement this by a two pointer method, We can maintain a pointer 'right' which denote the rightmost index which has been considered. Note that right is always nondecreasing, Hence it will guarantee that our algorithm takes O(N) time. Pseudo Code:
Now only difficulty in the entire algorithm described is to how to compute L[i] and R[i] for a given array LCP. Ways of Computing L[i], R[i] for all elements of Array ANow you have to solve the following problem. You are given an array A. For each element A[i] you have to find L[i] and R[i] where L[i] is the largest j <= i such that L[k] >= L[i] for all k lying between i and j. Similarly define R[i]. (In other words, For each index i, [L[i], R[i]] denotes the largest range such that all elements have minimum value equal to A[i].) Now if we can solve the problem of finding L[i], we can easily solve the problem of finding R[i] for each i (We can just reverse the array A and then finding R[i] is exactly the same as finding L[i]). Hence from now on, we will only look forward to solve problem of finding L[i]. If you have not solved the problem HISTOGRA on spoj, then try solving that. There are a lot of ways of solving this problem, you can read University of Ulm Local Contest judges solutions for problem H and solution mentioned on geeksforgeeks site. I will very briefly go through all of the methods mentioned there. Note that all these methods are explained in the details in the above given links. You are highly recommended to read those. Segment Tree Based Solution For reference solution, you can see editorialists solution to get an idea of its implementation. There are two segment trees, minSegmentTree and maxSegmentTree representing finding L[i] and R[i] respectively. Rest details are just similar to things explained before. Stack Based Solution Observe the following important properties: For sample implementation of this idea, you can refer setter solution. Disjoint Set Union Based solution Suffix Tree ApproachYou can solve the task easily by using suffix tree structure, You should do dfs over the suffix tree nodes. To get exact implementation details, you can view tester's solution. AUTHOR'S, TESTER'S AND EDITORIALIST's SOLUTIONS:Author's solution based on stack + suffix array
This question is marked "community wiki".
asked 19 May '14, 00:24

The setter's solutions seems to be wrong, as tested on one recent problem :P answered 08 Jun '15, 01:22

There is also rather easy solution using suffix automaton. Namely, for each state v you know how many substrings correspond to v and how many times each of them occurs in the string(number of paths from the start to v and from v to the end, respectively). Then you just multiply something and count some partial sums :D Also, you must take care of multiple final states in the automaton, but it's all details. answered 19 May '14, 00:37
@mmaxio: Suffix array solution is too complicated to explain :( But as many people wont be comfortable with suffix tree, hence I decide to explain the suffix array solution itself. Tester solved it using dfs over suffix tree.
(19 May '14, 08:09)

I have a question regarding tester's solution. Which algorithm is used to build the suffix tree ? or is it based on this link answered 19 May '14, 20:24

@All, If something is unclear or grammatically incorrect, please point that out. If you feel that any portion is not described well, then please don't hesitate to point that out. Thank you.
There should be a minor correction in the suffix array explanation:
The histogram must have values [1, 2 , 3 , 0, 1, 2, 1, 2].
How is the term (R[i]  L[i] + 2) obtained? Specifically, the "... + 2" part.
Let us consider that a string S has following suffixes
ABAA
ABCB
ABCD So LCP = [2, 3]. L = [0, 1] R = [1, 1] If you are at suffix ABAA(ie. i = 0), Your LCP[i] = 2, (R[i]  L[i] + 2 = 3), 3 tells you the suffix AB has appeared at 3 different positions ie at suffixes (ABAA, ABCB, ABCD, all the three suffixes)