UNIQUE - Editorial

cook43
editorial
hard
lcp
segment-tree
string
suffix-array
unique

#1

PROBLEM LINK:

Practice
Contest

Author: Shiplu Hawlader
Tester: Tasnim Imran Sunny
Editorialist: Lalit Kundu

DIFFICULTY:

HARD

PREREQUISITES:

Suffix Array
LCP Array
Segment Tree

PROBLEM:

We have a string of length N(<=10^5). For each index i=0 to N-1, we need to find a unique substring of minimum possible length which contains the index i.

EXPLANATION:

So, let’s begin with Suffix Array and LCP array. This has already been explained so many times earlier and I don’t think I can do better than those explanations. So here is an awesome tutorial on Suffix Arrays and LCP Array by @kuruma. If you don’t know about Suffix Array don’t continue before reading this tutorial. Also read about segment trees here and here.

If we have build the suffix array and LCP array.
For any i,
Let p = max(LCP(suffix[i-1],suffix*), LCP(suffix*,suffix[i+1]))
If p is not equal to length of i’th suffix, then all the substrings, S[suffix* to suffix*+p], S[suffix* to suffix*+p+1],…, S[suffix* to N-1] are unique substrings of length p+1, p+2, … respectively. How?

Suppose three suffices of a string after making suffix array are

suffix [ i - 1 ] : abccxyz…
suffix [ i ] : abcdfgh…
suffix [ i + 1 ] : abcdqrs…

LCP( i - 1 , i) = abc
LCP( i, i + 1 ) = abcd

So starting from index suffix*, at most 4 length substring “abcd” are common in another position suffix[i+1].
So we can say, “abcdf” is unique because if it is not, then there should be another suffix “abcdf…” just after suffix* in the suffix array.
As “abcdf” does not occurs anywhere else, also “abcdfg”, “abcdfgh” and so on are unique.

So, we know that from suffix* to suffix*+p (both inclusive) the minimum length unique substring is p + 1.
Similarly indices suffix*+p+1, suffix*+p+2, … etc. also part of unique substrings which starts from i. But we have to handles these two case differently.
Let us handle first case first.
We’ll handle this using segment tree. We have update queries of the type (i,j,p) in which we do:

for k=i to j:   
   if tree[k]>p:   
       tree[k]=p   

So, we sort in increasing order of p and star updating ranges beginning from largest p.
So, we keep an array of pair of q*=(p*,i). We sort it in increasing order. And then going backwards we update in segment tree the values. But we are not storing p* in the segment tree. We are storing suffix* in the segment tree. We will extract the minimum length from these indexes later. So we do this:

for i=n-1 to 0:    
    pos=q*.second    
    pp=suffix[pos]   
    if ( pp + q*.first - 1 < n ):   
        update( pp, pp + q*.first - 1, pp)   

Now, we read all values from the segment tree and suppose in a array posi we store the values taken from the segment tree.
We can computer array length from posi by doing this:

for i=0 to n-1:    
    length*=((posi*<0)?n+1:p[rank[posi*]])    

Now posi* and length* might not be storing the optimal answer. Because second case might give us the optimal cases. Indices suffix*+p+1, suffix*+p+2, … etc. also part of unique substrings which starts from i. In this case larger the suffix*, smaller the substring will be. Since we need suffix* to be largest, we can do this in one for loop. Pseudo code:

memset(g,-1,sizeof(g))    
for i=0 to n-1:   
    g[suffix*+p*]=suffix*   
t=-1   
for i=0 to n-1:   
    t=max(t,g*)   
    if t<0: continue    
    l=i-t+1   
    if l<length* or ( l==length* and rank[t]<rank[posi*]):   
    // means we have good solution, if l==length and lexigraphically smaller, we take the smaller one   
        posi*=t,length*=l   

Now, for each index we have starting(0-based index) posi* and length equal to length*.

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.
Tester’s solution can be found here.

RELATED PROBLEMS:

This pdf discusses Suffix Array and various questions in detail.
GERALD3
MOU1H
DIFTRIP


#2

I think while filling g[] you would overwrite some values. for example if p[3]=3 p[5]=5
suffix[3]=5 and suffix[5]=3 first g[8]=5 then g[8] overwrites with g[8]=3 but g[8] should have had 5 as it is closest to it


#3

Can someone please provide better explaination for last part of editorial? I am unable to understand it.


#4

I wrote my own solution while this was being published, haha. See

http://discuss.codechef.com/questions/38547/editorial-of-httpwwwcodechefcomcook43problemsunique-problem-is-not-found/38569