SSTRPREF - Editorial

PROBLEM LINK:

Contest - Division 3
Contest - Division 2
Contest - Division 1

DIFFICULTY:

MEDIUM

PREREQUISITES:

Z Function, Prefix Sums, RMQ

PROBLEM:

Given three strings S_1, S_2 and X, calculate the number of non-empty substrings of X that can be expressed as P+Q, where P and Q are prefixes of S_1 and S_2 respectively.

EXPLANATION:

Call a non-empty substring of X special if it can be expressed as the concatenation of some prefix of S_1 and S_2.
Then, we want to find the number of special substrings of X.

Claim: If X[L..R] is special, then X[L..R'] is also special, for all L\le R'\le R.

Proof

Let X[L..R] be expressed as P+Q, where P and Q are prefixes of S_1 and S_2 respectively.

To get X[L..R'], we can keep erasing the last element of Q (erase from P if Q becomes empty) till P+Q equals X[L..R']. All the while, P and Q still remain prefixes of S_1 and S_2, implying that X[L..R'] is also special.

Thus, proved.

Thus, if for each valid L, we can find the largest R_L such that X[L..R_L] is special, the answer to the problem is equivalent to

\sum (R_L-L+1)

Note: P and Q in the rest of the editorial exclusively represents some prefix substring of S_1 and S_2, respectively.

Consider some special substring X[L..R]=P+Q. If we want to maximize R for a fixed L, the combined length of P and Q should be maximised.
Then, for a fixed L, we can try each possible value of P and, greedily maximising the corresponding length of Q in each case, calculate the maximum possible combined length of P and Q.

Example

Let S_1 = \text{aab}, S_2 = \text{bbc}, X = \text{aabbc}.
For fixed L=1, the possible P and longest possible Q are:

  • P=``\text{aab}", Q=``\text{b}"
  • P=``\text{aa}", Q=``\text{bbc}"
  • P=``\text{a}", Q=``\text{}"
  • P=``\text{}", Q=``\text{}"

The maximum possible combined length is then 5 (case 2 above).

A thing to observe here is that - if P' is the longest possible P, then each valid P is a prefix of P'. This trivial observation motivates the solution.

Define Z_1[i] as the length of the longest prefix of S_1 that is a prefix of the suffix of X ending at index i. Define Z_2[i] for S_2 similarly.

How do I calculate this?

Both arrays can be calculated in linear time using the Z function.

If the Z-array of string S_1+``\text{\#}"+X is Z, then it is clear that array Z_1 is equal to the length |X| suffix array of Z. Similarly for Z_2 too.

Then, for a fixed L, the maximum possible length of P+Q is equal to

\max(Z_1[L]+Z_2[L+Z_1[L]],\\ (Z_1[L]-1)+Z_2[L+Z_1[L]-1],\\ (Z_1[L]-2)+Z_2[L+Z_1[L]-2],\\ \vdots\\ (0)+Z_2[L])

which can be written compactly as

\max_{0\le K\le Z_1[L]}\{(Z_1[L]-K)+Z_2[L+Z_1[L]-K]\}\\ =Z_1[L]+\max_{0\le K\le Z_1[L]}\{Z_2[L+Z_1[L]-K]-K\}

This value can be computed quickly using RMQ with suitable preprocessing.

How?

Define array B such that B[i] = Z_2[i]+i. Then the above equation can be rewritten as:

Z_1[L]+\max_{0\le K\le Z_1[L]}\{B[L+Z_1[L]-K]-(L+Z_1[L])\}\\ = Z_1[L]+\max_{0\le K\le Z_1[L]}\{B[L+Z_1[L]-K]\}-(L+Z_1[L])\\

which is further simplified to get

\max_{0\le K\le Z_1[L]}\{B[L+Z_1[L]-K]\}-L

The problem is now reduced to finding the Range Maximum on static array B, which can be done efficiently using a Sparse Table.

Using this, we can determine R_L for each L and compute the answer of the problem (using the formula at the end of the first section of the editorial).

TIME COMPLEXITY:

Computing Z_1 and Z_2 takes O(|S_1|+|X|) and O(|S_2|+|X|) respectively. Precomputing the sparse table for RMQ takes O(|X|\log|X|). With constant time RMQ, computing R_L for each L takes O(1).
The total complexity is therefore

O(|S_1|+|X|+|S_2|+|X|+|X|\log|X|+|X|)\approx\\ O(|S_1|+|S_2|+|X|\log|X|)

per test case.

SOLUTIONS:

Editorialist’s solution can be found here.
Author’s solution can be found here.


Experimental: For evaluation purposes, please rate the editorial (1 being poor and 5 excellent)

  • 1
  • 2
  • 3
  • 4
  • 5

0 voters

3 Likes