 # CLUNQUE EDITORIAL

Author and Editorialist: Aman Kumar Singh

Medium

# PREREQUISITES

Suffix array, LCP array, Segment tree or Fenwick tree.

# PROBLEM

Given a string S, a substring of string S is said to be unique if it has a number of occurrence in S exactly equal to 1, Given queries of the form
L R K, find the lexicographically smallest substring S[x, y] of length at least K such that L \leq x \leq y \leq R.

# EXPLANATION:

Let \forall i \ni 1 \leq i \leq N, p_{i} be the minimum index such that S[i, p_{i}] is a unique substring. Then all the substrings S[i, p_{i}+ 1], S[i, p_{i}+ 2], S[i, p_{i}+ 3], … , S[i, N], must be a unique substring.

We require Suffix array and LCP array to find p_{i}. Here is a nice video tutorial with practice problems.
Let SA be the suffix array of string S.
Let \forall i \ni 1 \leq i \leq N,
l_{i} = max(LCP(SA[i-1], SA[i]), LCP(SA[i], SA[i+1]) …(1)
If l_{i} is not equal to the length of i^{th} suffix, p_{SA[i]} = SA[i]+ l_{i}, else p_{SA[i]} = INF
Note: For index 1 and N, we actually only take the LCP’s with index 2 and N-1 respectively.

We’ll be discussing a few lemmas after which you can convince yourself with the above claim:

Lemma 1: The substring S[SA[i], SA[i] + l_{i}] is a unique substring if l_{i} is not equal to the length of i^{th} suffix.
Proof: SA denote the suffixes sorted in lexicographical order,
i.e. S[SA[i - 1], N] < S[SA[i], N] < S[SA[i+1], N]. We prove the above lemma by contradiction.
Let S[SA[i], SA[i] + l_{i}] be not a unique unique substring, i.e. there exist another substring of length l_{i} + 1 such that it is equal to S[SA[i], SA[i] + l_{i}]. Since S[SA[i], N] < S[SA[i+1], N],
l_{i} + 1 \leq LCP(SA[i], SA[i+1]), which is clearly a contradiction since from relation (1), LCP(SA[i], SA[i+1]) \leq l_{i}.

Lemma 2: There does not exist x_{i} < l_{i}, such that S[SA[i], SA[i] + x_{i}] is a unique substring.
Proof: Let x_{i} = l_{i} - 1 and S[SA[i], SA[i] + l_{i} - 1] be a unique substring, i.e. there does not exist another substring of length l_{i} such that it is equal to S[SA[i], SA[i] + l_{i} - 1].
But from relation (1), atleast one of LCP(SA[i-1], SA[i]), LCP(SA[i], SA[i+1]) must be l_{i}, which clearly denotes that there exists another substring of length l_{i} which is equal to S[SA[i], SA[i] + l_{i} - 1] so its a contradiction again.

Let ord_{i} be the inverse SA_{i} at each index. It can be calculated simply as:

for i in [1, N]:
ord[SA[i]] = i


Let’s consider a list of pairs seg such that seg_{i} = (p_{i}, i). Sort these pairs in non-decreasing order of p_{i}. We also require min-segment tree or fenwick tree.
Now let’s do a sweep-line for i in [1, N]. After sorting seg, let (x, y) pair denote seg_{j}. For all such segments such that seg_{j}.x \leq i, update segtree at index seg_{j}.y with ord[seg_{j}.y]. For all queries (L, R, K) such that R=i, query for minimum in segment tree in the range [L, R - K + 1], since we require the substring to be of length at least K.

Why querying minimum in the range [L, R - K + 1] works ?
Since the substrings are unique now, each of them can be uniquely determined and compared based on ord_{i} values.

The pseudo code for this part:

Let st = min_segment_tree
j = 0
for i in [1, N]:
while(j < N and seg[j].x <= i):
st.update(seg[j].y, ord[seg[j].y])
j++
for query such that query.R = i:
mn = st.min_query(query.L, query.R - query.K + 1)
if(mn != INF):
ans[query.id] = {SA[mn], max(SA[mn] + query.K - 1, p[SA[mn]])}
else:
ans[query.id] = {-1, -1}


C++ solution
Java solution

5 Likes