SUBPROB - Editorial



Author: Naman Jain
Tester: Felipe Mota




KMP, Heavy Light Trick, Euler Tour of a Tree


You are given two strings P and Q. Consider some uniformly randomly and independently chosen non-empty prefix X and non-empty suffix Y of P. A string Z is formed by concatenating X and Y in this order. Find the probability that Z is a substring of Q.


With the help of KMP, the problem can be reduced to a data-structure problem on trees which can then be solved using heavy light trick.


Let n = |P| , m = |Q|
Let x be a non-empty prefix of P ending at i. We calculate the number of possible y s.t. y is a suffix of P and x+y is a substring of Q. For this, we see all occurrences of x in Q. Let one such occurrence be ending at index j. Then we need to consider prefixes of Q[(j+1)....m] which are suffixes of P, Let this set be A.

Consider a tree T_1, of size n + m + 1, constructed as follows :

  1. Node 0 is the root
  2. For 1 \le v \le n, the parent[v] = \pi_P[v]. [ \pi denotes the prefix function, read more about it here ]
  3. For n+1 \le v \le n+m, parent[v] = length of longest suffix of Q[1...(v-n)], which is a prefix of P.

In other words, nodes of T_1 represent the prefixes of P and Q, the the parent of a node is its longest suffix which is a prefix of P.

Similarly, we can construct T_2, with reverse(P) and reverse(Q)

We can observe that all occurrences of x in Q are represented by the Q-nodes [nodes representing substrings of Q] in the subtree rooted at i in T1. Also, the set A described above is equivalent to the ancestors of the node (i+j+1) [ the node representing suffix of Q starting at j+1 ] in T_2.
So basically what we need to do is, for every Q-node in the subtree of x in T_1, we want to collect the ancestors of some node in T_2, and calculate the size of the set of all these ancestors.

So this reduces to a problem similar to the following :
You are given 2 trees T_1, T_2 and a mapping of nodes from T_1 to T_2. For a node u in T_1, let B(u) be the set of nodes in the subtree rooted at u in T_1. Let C(u) be the set of nodes corresponding [according to mapping] to B(u) in T_2. Let D(u) be the union of the ancestors of C(u) in T_2. Then find the cardinality of D(u) for all u.

There can be several approaches to solve this reduced problem. Here is one of them :
The set D(u) for a node u can be formed by merging the sets D(child_u) for all children child_u of u. We can optimize this merging step by using heavy light trick.
During the heavy light dfs, We maintain the set C described above, and calculate |D(u)| for all nodes. For adding a node to the set , we need to find the number of extra ancestors it would contribute to D(u). The can be efficiently done by keeping the set C(u) ordered according to the euler order on T_2. Now the contribution of adding a node to this set can be calculated by finding the lca with the neighboring nodes in the current set and choosing the closer one of them. The contribution will be the distance to this lca. This solution has a complexity of O(N \cdot log^2N), the log factors coming from the heavy-light trick and querying/maintaining the set of nodes.

Here are some resources to the algorithms, data-structures used in the solution :

  1. KMP
  2. Heavy Light Trick
  3. Euler Tour of a Tree

The AC submissions in the contest have several interesting approaches to the problem, many of them quite different from the solution described above. Feel free to check them out.


Setter's Solution
#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
using namespace std;
template<class T> ostream& operator<<(ostream &os, vector<T> V) {
 os << "[ "; for(auto v : V) os << v << " "; return os << "]";}
template<class L, class R> ostream& operator<<(ostream &os, pair<L,R> P) {
	return os << "(" << P.first << "," << P.second << ")";}
#define TRACE
#ifdef TRACE
#define trace(...) __f(#__VA_ARGS__, __VA_ARGS__)
template <typename Arg1>
void __f(const char* name, Arg1&& arg1){
	cout << name << " : " << arg1 << std::endl;
template <typename Arg1, typename... Args>
void __f(const char* names, Arg1&& arg1, Args&&... args){
	const char* comma = strchr(names + 1, ',');cout.write(names, comma - names) << " : " << arg1<<" | ";__f(comma+1, args...);
#define trace(...) 1
#define ll long long
#define ld long double
#define vll vector<ll>
#define pll pair<ll,ll>
#define vpll vector<pll>
#define I insert 
#define pb push_back
#define F first
#define S second
#define endl "\n"
#define vi vector<int>
#define pii pair<int, int>
#define vpii vector< pii >
const int mod = 998244353;
inline int mul(int a,int b){return (a*1ll*b)%mod;}
inline int add(int a,int b){a+=b;if(a>=mod)a-=mod;return a;}
inline int power(int a,int b){int rt=1;while(b>0){if(b&1)rt=mul(rt,a);a=mul(a,a);b>>=1;}return rt;}
inline int inv(int a){return power(a,mod-2);}
inline void modadd(int &a,int &b){a+=b;if(a>=mod)a-=mod;} 

const int M = 2e5+15;

void Kmp(string& s, int l, vi& pref) {
	pref[0] = 0;
	int cur = 0;
	for(int i=1; i<l; i++){
		while (true) {
			if(s[i] == s[cur]) { pref[i] = ++cur; break; }
			if(cur == 0){ pref[i] = 0; break; }
			cur = pref[cur-1]; 

int t1, t2;
vector<vi> g1(M), g2(M);
vi par1(M), par2(M);

int n, m;

void make_tree(int& t, vector<vi>& g, string& p, string& q, vi& par) {
	vi pref(n);
	Kmp(p, n, pref);

	t = n + m + 1;

	par[0] = 0;

	for(int i=1; i<=n; i++) {
		par[i] = pref[i-1];

	int cur = 0;

	for(int i=n+1; i<t; i++) {
		int x = i - (n+1);
		while (true) {
			if (cur == n) { cur = pref[cur-1]; }
			if (cur >= n) assert(0);
			if (q[x] == p[cur]) { par[i] = ++cur; break; }
			if (cur == 0) { par[i] = 0; break; }
			cur = pref[cur-1];


const int L = 18; // log2(2e5) = 17.61

int anc[M][L];
int dep[M];
int euler[M];
int cnt;

void dfsT2(int x) {
	euler[x] = cnt++; 
	for(auto z : g2[x]) {
		dep[z] = dep[x] + 1;

void processT2() {
	for (int j=0; j<L; j++) {
		for (int i=0; i<t2; i++) {
			if (j==0) anc[i][0] = par2[i];
			else anc[i][j] = anc[anc[i][j-1]][j-1];
	dep[0] = 0;
	cnt = 0; 

int lcaT2(int x, int y) {
	if(dep[x] < dep[y]) swap(x, y);
	int dif = dep[x] - dep[y];
	int a;
	for(int i=0, a=1; dif>0; i++, a<<=1) {
		if(dif & a) {
			x = anc[x][i]; dif ^= a;

	if(x == y) return x;
	for(int i=L-1; i>=0; i--) {
		if(anc[x][i] != anc[y][i]) {
			x = anc[x][i];
			y = anc[y][i];
	return anc[x][0];

int sub[M], hc[M];

void dfsT1(int x) {
	sub[x] = 1; 
	pii mx = {0, -1};
	for(auto z : g1[x]) {
		sub[x] += sub[z];
		mx = max(mx, {sub[z], z});
	hc[x] = mx.S;

void processT1(){

int cur_cnt = 0;
set<pii> nodes;

inline int convert(int x) {
	x -= n; x = m + 1 - x;
	x += n; return x; 

void add(int x) {
	x = convert(x);
	int ind = euler[x]; 
	set<pii>::iterator nr;
	int lca = -1;
	if( (nr = nodes.lower_bound({ind, 0})) != nodes.end() ) {
		lca = lcaT2(nr->S, x);
	if ( nr != nodes.begin() ) {

		int tmp = lcaT2(nr->S, x);
		if((lca==-1) || dep[tmp] > dep[lca]) lca = tmp;
	int ext = (dep[x]-1) - ((lca == -1) ? -1: dep[lca]) ;
	cur_cnt += ext;
	nodes.I({ind, x});

void dfsAdd(int x) {
	for (auto z:g1[x]) {
	if(x > n && x < n+m) add(x+1);

int Ans;

void dfsHL(int x) {
	for (auto z : g1[x]) {
		if (z == hc[x]) continue;
		cur_cnt = 0;
	if(hc[x] !=-1) dfsHL(hc[x]);
	for(auto z : g1[x]) {
		if (z == hc[x]) continue; 
	if (x > n &&  x < n + m) add(x+1);
	if (x !=0 && x<=n) {
		if(cur_cnt > 0) Ans = add(Ans, cur_cnt - 1);

void solve() {
	string P, Q; 
	cin >> P >> Q ;
	string revP = P; reverse(revP.begin(), revP.end()); 
	string revQ = Q; reverse(revQ.begin(), revQ.end()); 

	n = P.size();
	m = Q.size();

	for(int i=0; i<= n+m+5; i++) {

	make_tree(t1, g1, P, Q, par1);
	make_tree(t2, g2, revP, revQ, par2);


	cur_cnt = 0; nodes.clear(); Ans = 0;

	// cout<<"Count: "<< Ans<<"\n";
	Ans = mul(Ans, inv(mul(n, n)));

int main(){
	int T; cin>>T;

There is O(n log n) solution with Suffix automaton + Segment tree + Z-function.

I tried a lot figuring out how suffix automata can be used, couldnot come up with any leads before arriving at the intended solution. I would be glad if you share your approach

I solved it using Suffix automaton + Suffix Array (with Sparse Table to calculate LCP) + Persistent Segment Tree (and Fenwick Tree in some parts to reduce hidden constant).

I will explain my approach.


We can reformulate the problem as follow:
We want to find the probability U/V
V= |P| * |P|
U= \text{\# combinations of a prefix and a suffix of }P \text{ that produces a substring of } Q

In order to calculate U we can do the following: for each unique substring of Q, let’s calculate in how many ways can you partition it in a two parts, such that left part is a prefix of P and the right part is a suffix of P. It is not hard to see that this procedure will not count anything twice.

Small proof

By contradiction if you count two equal pairs of (pref, suff), their combination will be the same string, and we are only counting unique substrings

How to only consider unique substrings of Q ? We can do this with suffix automaton.
Let’s check some basic properties of the suffix automaton:

Properties of Suffix Automaton

If we build a suffix automaton of a string s with root (initial state) t_0:

  • Every path from the root of the DAG to some node u will produce a unique substring of s
  • Every node u of the DAG correspond to some set of strings that have the same set of ending position, and let’s say that the longest string of this state has lenght maxLen and the shortest has lenght minLen. Then that state contains maxLen - minLen strings, each of them with a unique length in the range [minLen , maxLen]. Moreover, all of them are suffixes of the largest one.

More information can be found here.

Now, let’s build the automaton for Q. Each node u of the automaton corresponds to a set of endping positions. We can pick one of them to simplify the analysis. For example let’s take this picture from cp-algorithms that corresponds to the suffix automaton (the left graph) of the string Q="abcbc". If we see for example the state "bc", this state have the strings "c", "bc" and the set of end positions \{2, 4 \} (0-indexed), which means that all appearance of "bc" and "c" in Q ends in positions 2 or 4. So in that case, we can pick, for example the ending position 2. This will helps us to make the calculations for every unique substrings of Q so we’ll not be counting twice.

So, for each node u of the automaton, we have the substring that ends in position r and start in l \in [l_1, l_2], where l_1=r-maxLen[u], l2=r-minLen[u]. For example, in the string "abcbc", the node "bc" have r=2, l1=1, l2=2.

Quadratic Solution

So now, we have this problem:
We have a bunch of triplets (l_1, l_2, r) and for each triplet, we need to count, for each l \in [l_1, l_2] , in how many ways can we partition the substring Q[l...r] into a combination of prefix + suffix of P.

Let’s analyze how to answer this for a fixed (l,r) \rightarrow Q[l...r]. Let’s define maxPref[l] as the maximum k such that Q[l...k] is a prefix of P and maxSuf[r] as the minimum k such that Q[k...r] is a suffix of P. Obviously, every i \in [l, maxPref[l] produce a string Q[l...i] that is also a prefix of P and something similar happens with maxSuf[r]. So with that values, you can easily calculate the number of ways to partition Q[l...r]. See the following image.

Obviously, to be able to have a valid combination, the intersection of this “red” interval and “blue” interval have to be non empty. Also you have to be careful to not count the combination of empty prefixes or empty suffixes. After some thinking, you can get the following formula (I replace maxPref with pref and maxSuf with suf to shorten the names):

max(0, min(pref[l], r - l) - max(0, r - l - suf[r])).

So now we have an O(n^2) solution (link to quadratic solution). Notice that we can calculate pref[i], suf[i] easily in O(n^2). I think the code is very simple in that case (it won’t be the case for the full task :cry:)

for each triplet (l1, l2, r)
    for l in [l1, l2]
        ans += max(0, min(pref[l], r - l) - max(0, r - l - suf[r]))

How to improve the time complexity ?

Optimal solution

Ok, we know that suffix automaton is linear, so that’s not the problem. But we need to fix 2 things:

  • We need to calculate pref[i] and suf[i] efficiently
  • We need to answer the queries efficiently
Improving pref and suf calculations

This is when Suffix Array and LCP comes handy. Maybe there are other approaches but I did the following to calculate pref[i] (it is very similar for suf[i]):

Let’s build the suffix array of the string t=Q + P (the concatenation). Now, let’s say we want to calculate pref[i] for i \in [0, |Q|-1]. Let’s say that the suffix of t that starts at i is in position pos1 in the suffix array, and the suffix of t that starts at |Q| is in position pos2 in the suffix array.
Then pref[i]= lcp(pos1, pos2).

More information about suffix array and LCP array can be found here.

All the construction of the suffix array can be done in O(n log n) (with n=|P| + |Q|). You can also use some data structure like Sparse table to answer the lcp queries. Overall complexity of all this process is O(n log n)

Answering the queries fast enough

Now the problem has been transform to a data structure problem. I suppose there are many approaches here but I used Fenwick Tree and Persistent Segment Tree.

Remember that we have queries in form of triplets (l1, l2, r) and we need to do this in each query for each l \in [l_1, l_2] : ans += max(0, min(pref[ l ], r - l) - max(0, r - l - suf[ r ])).

We are only interested in the intervals that have some intersections (remember the figure of red and blue intervals). So we need to be careful not taking into account those values that have no intersection, otherwise we’re going to have a decrease in the answer.

So it’s necessary that pref[l] + suf[r] \geq r - l, or equivalently l + pref[l] \geq r - suf[r].

Let’s solve the queries (triplets) offline. Sort the queries by r - suf[r] in non ascending order. Considered that all l are “inactive” and you would be “activating” them as we move along the queries. So that will guarantee that in each query, we’re only taking into account those valid l that have intersection with the query.

Let’s sort the l values by l + pref[l]. When we move along the queries, the value of r-suf[r] will decrease or stay the same, so you might add new values of l to take into account. When I say “add” this new values I mean adding to an appropiate data structure to handle the queries.

You can divide the queries into two parts: First sum min(pref[ l ], r - l) for all valid l in the range [l_1, l_2], then substract max(0, r - l - suf[ r ]) for all valid l in the range [l_1, l_2].

The second part is easy, you can considered all the l_1 \leq l \leq min(l_2, r - suf[r]), count how many of them are (cnt) and add the values of all the l. The total value to decrease is cnt * (r - suf[r]) - \sum_{\text{valid } l}{l}. This can be done with simple Fenwick Tree or Segment Tree.

First part is a little more complicated. Maybe I made an overkill here but I used Persistent Segment Tree, creating |Q| versions of Segment Trees, insertig there the values of pref[l] and l in non-descending order of pref[l] + l, so that when I need to answer the queries, I can binary search the right version of the Segment Tree. I don’t know if there is an easier approach here.

Overall Complexity of this process is O(|Q| log |Q|)

Warning ! I got TLE using Persistent Segment Tree with pointers. I had to switch to a non-pointer version.

Overall complexity O(n log n) (with n=|P| + |Q|)

This was not easy to code, but you can see my submission here: Link to my solution

I also want to thank the problem setter for such a great problem !