PROBLEM LINK:
Author: Vasia Antoniuk
Tester: Mahbubul Hasan and Sunny Aggarwal
Editorialist: Balajiganapathi Senthilnathan
Russian Translator: Sergey Kulik
Mandarian Translator: Minako Kojima
DIFFICULTY:
Medium
PREREQUISITES:
Z function, combinatorics
PROBLEM:
Given a string and a query k, output the number of ways we can choose k equal substrings from the string.
SHORT EXPLANATION
Let cnt[i] be the number of different substrings that occur in the string exactly i times. Then the answer for a particular k ( \le n) will be :
\sum_{i = k}^{n} cnt[i] * \binom{i}{k} we can calculate cnt[i] using suffix array/Z function.
EXPLANATION:
Let cnt[i] be the number of different substrings that occur in the string exactly i times.
For example, for the string “ababa”, the array will be:
cnt[1] = 4 → {“ababa”, “abab”, “baba”, “bab”} occur exactly once
cnt[2] = 4 → {“aba”, “ab”, “ba”, “b”} occur exactly twice
cnt[3] = 1 → {“a”} occur exactly thrice
cnt[4] = 0
cnt[5] = 0
Now suppose we want the answer for k = 2, i.e. number of ways to choose 2 equal strings. How do we calculate this using the cnt array?
cnt[1] is of no use as we need 2 equal string.
cnt[2] → There are 4 different substrings that occur 2 times. So, we can add 4 to the answer.
cnt[3] → There is only 1 substring that occurs 3 times. But we need only 2 times, so we can choose any 2 of the 3. So, \binom{3}{2} = 3
Summing them, for k = 2 we get 7.
Now let us focus on calculating the cnt array.
To calculate this let us loop through all suffixes of S.
For the suffix P = S[i..N], let us calculate the array Z[i..N] where Z[i] is the maximal equal substring starting from i that matches a prefix of S[i..N].
For example, for the above string consider i = 1, the whole string is the suffix
“ababa”.
Here the Z array would be
Z[1] = 5 → maximum prefix matching of “ababa” and “ababa”
Z[2] = 0 → maximum prefix matching of “ababa” and “baba”
Z[3] = 3 → maximum prefix matching of “ababa” and “aba”
Z[4] = 0 → maximum prefix matching of “ababa” and “ba”
Z[5] = 1 → maximum prefix matching of “ababa” and “a”
How is this array useful for calculating cnt?
From the above array we can conclude that we have 1 substring which can be choosen atleast 3 times. How? Because there are 3 entries in the above array that are greater than or equal to 1.
Similarly we can deduce for 2 (2 times), 3 (2 times), 4 (1 time ) and 5(1 time).
So we increment cnt[3] once, cnt[2] twice and cnt[1] twice.
We do this for all suffix of S. The only thing left is to observe that cnt[i] now actually has how many different strings can be choosen atleast i times. We want it to be exactly i times. This is simple: we just subtract from cnt[i] (cnt[i + 1] + cnt[i + 2] … + cnt[N]).
We can compute the Z function in O(n) time. See here for how to do that.
Time Complexity:
There are N suffixes and we take O(N) time for each suffix so total time is:
$O(N^2).