SSTRPREF2 - Editorial

infinitepro · October 22, 2021, 6:23am

PROBLEM LINK:

Contest

DIFFICULTY:

MEDIUM

PREREQUISITES:

Z-Function, KMP

PROBLEM:

Given strings S_1,S_2 and X. Determine the number of ordered pairs of strings (P,Q) such that:

P and Q are (possibly empty) prefixes of S_1 and S_2, respectively.
String P+Q is a substring of X.

EXPLANATION:

You may wish to have a look at a slightly similar problem first - SSTRPREF.

Note: P and Q in the editorial refer exclusively to some prefix of S_1 and S_2, respectively.

Definition: We call a string S good if it is a substring of X.

Observation: If P+Q is a good string, then P+Q' is also good, where Q' is some prefix string of Q.

Thus, for a fixed P, the number of Q' such that P+Q' is good = the length of the longest possible Q such that P+Q is good, plus 1 (for when Q' is empty). Summing this value over all prefix strings P gives us the answer to the problem!

Formally, let sol_i equal the largest integer x such that S_1[..i]+S_2[..x] is good. Then, our answer is equal to \sum (sol_i+1) over all valid i.
How do we calculate this array efficiently?

Suboptimal solution

Let J_i represent the set of all indices x such that S_1[..i] is a suffix of X[..x]. That is, the J_i is set of all x such that S_1[0..i]=X[(x-i)..x].
Also, let K_x represent the length of the longest prefix of S_2 that is also a prefix of X[x..].

Then, it is easy to see that

sol_i = \max(K_{x+1})\ \forall x\in J_i

Array K can be calculated in O(|S_2+X|) using Z-function. Using suitable preprocessing, it is possible to calculate J in worst-case, O(|S_1|+|X|^2) using KMP.

Calculating J takes a lot of time, so we need to optimise!

Optimal solution

Let L_x represent the set of all indices i such that S_1[..i] is a suffix of X[..x]. As before, let K_x represent the length of the longest prefix of S_2 that is also a prefix of X[x..].

Then, the equation (in the previous spoiler) can be rewritten as

sol_i=\max(K_{x+1})\ \forall x\text{ such that, } i\in L_x

which is algorithmically equivalent to doing, for every x

sol_i := \max(sol_i, K_{x+1})\ \forall i\in L_x

Clearly, this doesn’t improve the overall complexity (computing L is still quadratic in the worst case). But ~~I wouldn’t have mentioned it if it were useless~~ there is an interesting property that can be observed in the set L_x.

Observation: Let a,b\ (a<b) be any two elements in set L_x. Then, S_1[..a] is a suffix of S_1[..b]. Conversely, if b\in L_x and S_1[..a] is a suffix of S_1[..b], then a\in L_x.
(The proof’s of this is rather trivial, and is left to the reader as an exercise).

How is this helpful? It is easy to deduce from it, that for all x

L_x = \{L_{a_x}, b_x\}

where b_x is the greatest index such that S_1[..b_x] is a suffix of X[..x], and a_x is the greatest index such that S_1[..a_x] is a suffix of S_1[..b_x].

Then, instead of doing the following for every i,

sol_i := \max(sol_i, K_{x+1})\ \forall i\in \{L_{a_x}, b_x\}

we can do the operation on only i=b_x, and lazily mark all elements in L_{a_x} to be updated recursively (like in lazy segment trees). This approach is linear in its time complexity.

Thus, precomputing array K (using Z-function) and a_x and b_x (using KMP), we can solve the problem in linear time!

TIME COMPLEXITY:

Running KMP and Z-function takes O(|S_1+X|) and O(|S_2+X|) respectively. Calculating the solution takes O(|X|+|S_1|+|S_2|).
Thus, the total time complexity is

\approx O(|S_1|+|S_2|+|X|)

per test case.

SOLUTIONS:

Editorialist’s solution can be found here

Experimental: For evaluation purposes, please rate the editorial (1 being poor and 5 excellent)

1
2
3
4
5

0 voters