×

# UNIQUE - Editorial

Tester: Tasnim Imran Sunny
Editorialist: Lalit Kundu

HARD

# PROBLEM:

We have a string of length N(<=10^5). For each index i=0 to N-1, we need to find a unique substring of minimum possible length which contains the index i.

# EXPLANATION:

So, let's begin with Suffix Array and LCP array. This has already been explained so many times earlier and I don't think I can do better than those explanations. So here is an awesome tutorial on Suffix Arrays and LCP Array by @kuruma. If you don't know about Suffix Array don't continue before reading this tutorial. Also read about segment trees here and here.

If we have build the suffix array and LCP array.
For any i,
Let p = max(LCP(suffix[i-1],suffix[i]), LCP(suffix[i],suffix[i+1]))
If p is not equal to length of i'th suffix, then all the substrings, S[suffix[i] to suffix[i]+p], S[suffix[i] to suffix[i]+p+1],..., S[suffix[i] to N-1] are unique substrings of length p+1, p+2, ... respectively. How?

Suppose three suffices of a string after making suffix array are

suffix [ i - 1 ] : abccxyz...
suffix [ i ] : abcdfgh...
suffix [ i + 1 ] : abcdqrs...

LCP( i - 1 , i) = abc
LCP( i, i + 1 ) = abcd

So starting from index suffix[i], at most 4 length substring "abcd" are common in another position suffix[i+1].
So we can say, "abcdf" is unique because if it is not, then there should be another suffix "abcdf..." just after suffix[i] in the suffix array.
As "abcdf" does not occurs anywhere else, also "abcdfg", "abcdfgh" and so on are unique.

So, we know that from suffix[i] to suffix[i]+p (both inclusive) the minimum length unique substring is p + 1.
Similarly indices suffix[i]+p+1, suffix[i]+p+2, ... etc. also part of unique substrings which starts from i. But we have to handles these two case differently.
Let us handle first case first.
We'll handle this using segment tree. We have update queries of the type (i,j,p) in which we do:

for k=i to j:
if tree[k]>p:
tree[k]=p


So, we sort in increasing order of p and star updating ranges beginning from largest p.
So, we keep an array of pair of q[i]=(p[i],i). We sort it in increasing order. And then going backwards we update in segment tree the values. But we are not storing p[i] in the segment tree. We are storing suffix[i] in the segment tree. We will extract the minimum length from these indexes later. So we do this:

for i=n-1 to 0:
pos=q[i].second
pp=suffix[pos]
if ( pp + q[i].first - 1 < n ):
update( pp, pp + q[i].first - 1, pp)


Now, we read all values from the segment tree and suppose in a array posi we store the values taken from the segment tree.
We can computer array length from posi by doing this:

for i=0 to n-1:
length[i]=((posi[i]<0)?n+1:p[rank[posi[i]]])


Now posi[i] and length[i] might not be storing the optimal answer. Because second case might give us the optimal cases. Indices suffix[i]+p+1, suffix[i]+p+2, ... etc. also part of unique substrings which starts from i. In this case larger the suffix[i], smaller the substring will be. Since we need suffix[i] to be largest, we can do this in one for loop. Pseudo code:

memset(g,-1,sizeof(g))
for i=0 to n-1:
g[suffix[i]+p[i]]=suffix[i]
t=-1
for i=0 to n-1:
t=max(t,g[i])
if t<0: continue
l=i-t+1
if l<length[i] or ( l==length[i] and rank[t]<rank[posi[i]]):
// means we have good solution, if l==length and lexigraphically smaller, we take the smaller one
posi[i]=t,length[i]=l


Now, for each index we have starting(0-based index) posi[i] and length equal to length[i].

# AUTHOR'S AND TESTER'S SOLUTIONS:

Author's solution can be found here.
Tester's solution can be found here.

# RELATED PROBLEMS:

This pdf discusses Suffix Array and various questions in detail.
GERALD3
MOU1H
DIFTRIP

This question is marked "community wiki".

3.0k93164187
accept rate: 12%

1
(17 Feb '14, 19:49) 7★

 0 I think while filling g[] you would overwrite some values. for example if p[3]=3 p[5]=5 suffix[3]=5 and suffix[5]=3 first g[8]=5 then g[8] overwrites with g[8]=3 but g[8] should have had 5 as it is closest to it answered 24 Feb '14, 23:16 173●2●3●7 accept rate: 25%
 0 Can someone please provide better explaination for last part of editorial? I am unable to understand it. answered 23 Sep '17, 22:30 4★tapanr97 16●1 accept rate: 25%
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×15,680
×1,755
×1,343
×643
×90
×36
×11
×3

question asked: 17 Feb '14, 19:32

question was seen: 5,032 times

last updated: 23 Sep '17, 22:30