×

MEDIUM

# PROBLEM

You are given a sequence of heights. You wish to find the number of unique contiguous subsequences in heights, such that, for any two of them

• Either their length is different
• The sequences formed by taking difference between consecutive heights, is different

# QUICK EXPLANATION

We can take the difference between consecutive heights and work on only those. We wish to find the number of unique contiguous subsequences (we consider that two subsequences are different anyway if they have different lengths).

This is equivalent to the problem of finding the number of unique substrings in a string. For the rest of this discussion, we will refer to the length of the string as N.

The naive method to finding the number of unique substrings in a string is to consider each sub-string and find the number of unique ones. This algorithm can be implemented in O(N2) by using a smart hash function and a hash table.

Another O(N2) approach uses the Knuth Morris Pratt algorithm. The failure function may be calculated for each suffix of the string and used to eliminate repeated substrings. We will not discusss this approach in detail because in this problem the length of the string can be as much as 99,999. This means that we cannot use any O(N2) approach.

You may already know about the Suffix Arrays and LCP Arrays. We can use them to solve this problem in O(N). Refer to the this paper to learn the Skew Algorithm for sorting suffixes and finding LCP Array in O(N) time.

We will describe an algorithm to construct the Suffix Array in order O(Nlog2N) and then find the Longest Common Prefixes between adjacent suffixes in the Suffix Array in O(log N) time. The explanation below is largely based on this resource.

We will then see how to find the number of unique substrings of a string by using the LCP Array.

# EXPLANATION

Take the example of the following string

mississippi

length = 11


The suffixes of the above string are

 0: mississippi
1: ississippi
2: ssissippi
3: sissippi
4: issippi
5: ssippi
6: sippi
7: ippi
8: ppi
9: pi
10: i


If we were to sort all the above suffixes in lexicographic order, we would obtain the string

10: i
7: ippi
4: issippi
1: ississippi
0: mississippi
9: pi
8: ppi
6: sippi
3: sissippi
5: ssippi
2: ssissippi


Whenever we say Suffix Array, we mean the integer array representing the order in which the suffixes appear when sorted.

[ 10, 7, 4, 1, 0, 9, 8, 6, 3, 5, 2 ]


A naive way in which you can create the above array is to start with the array

[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]


And then apply the standard quick-sort algorithm using a comparator function that compares two strings. The complexity of sorting the array and obtaining the Suffix Array would be O(N2log N).

But surely we can do better. We know that we are not sorting random strings, rather, strings that are suffixes of the same string.

Let us consider the original array or suffixes, sorted only according to the first 2 character. If the first 2 character is the same, we consider that the strings have the same sort index.

Sort-Index Suffix-Index

0       10: i
1        7: ippi
2        1: ississippi
2        4: issippi
3        0: mississippi
4        9: pi
5        8: ppi
6        3: sissippi
6        6: sippi
7        2: ssissippi
7        5: ssippi


Now, we wish to use the above array, and sort the suffixes according to their first 4 characters. To achieve this, we can assign 2-tuples to each string. The first value in the 2-tuple is the sort-index of the respective suffix, from above. The second value in the 2-tuple is the sort-index of the suffix that starts 2 positions later, again from above.

If the length of the suffix is less than 2 characters, then we can keep the second value in the 2-tuple as -1.

Sort-Index Suffix-Index                Suffix-Index
after first 2 chars
and 2-tuple assigned

0       10: i               -1   (0, -1)
1        7: ippi             9   (1, 4)
2        1: ississippi       3   (2, 6)
2        4: issippi          6   (2, 6)
3        0: mississippi      2   (3, 7)
4        9: pi              -1   (4, -1)
5        8: ppi             10   (5, 0)
6        3: sissippi         5   (6, 7)
6        6: sippi            8   (6, 5)
7        2: ssissippi        4   (7, 2)
7        5: ssippi           7   (7, 1)


Now, we can call quick-sort and sort the suffixes according to their first 4 characters by using the 2-tuples we constructed above! The result would be

Sort-Index Suffix-Index

0       10: i
1        7: ippi
2        1: ississippi
2        4: issippi
3        0: mississippi
4        9: pi
5        8: ppi
6        3: sissippi
7        6: sippi
8        2: ssissippi
9        5: ssippi


Similarly constructing the 2-tuples and performing quick-sort again will give us suffixes sorted by their first 8 characters.

Thus, we can sort the suffixes by the following pseudo-code

SortIndex[][] = { 0 }

for i = 0 to N-1
SortIndex[0][i] = order index of the character at A[i]

doneTill = 1
step = 1

while doneTill < N
L[] = { (0,0,0) } // Array of 3 tuples

for i = 0 to N-1
L[i] = ( SortIndex[step - 1][i],
SortIndex[step - 1][i + doneTill],
i
)
// We need to store the value of i to be able to retrieve the index

sort L

for i = 0 to N-1
SortIndex[step][L[i].thirdValue] =
0, if L[i] and L[i-1] have the same first and second values
i, otherwise

++step
doneTill *= 2



The above algorithm will find the Suffix Array in O(N log2 N). This is, in fact, enough for this problem.

We can use the SortIndex array we constructed above to find the Longest Common Prefix, between any two prefixes.

FindLCP (x, y)

for k = ceil(log N) to 0
if SortIndex[k][x] = SortIndex[k][y]
// sort-index is same if the first k characters are same
// now we wish to find the characters that are same in the remaining strings
x += 2k
y += 2k


The LCP Array is the array of Longest Common Prefixes between the ith suffix and the (i-1)th suffix in the Suffix Array. The above algorithm needs to be called N times to build the LCP Array in a total of O(N log N) time.

The LCP Array for the string we have been using till now is

LCP SA

0   10: i
1    7: ippi
1    4: issippi
4    1: ississippi
0    0: mississippi
0    9: pi
1    8: ppi
0    6: sippi
2    3: sissippi
1    5: ssippi
3    2: ssissippi


Once we know the LCP array, we can find the number of unique substrings by simply ignoring all the prefixes of each suffix that were common from the suffix before it - since they were already counted as the prefix of the last suffix. Hence the answer is

(Sum of length of all Suffixes) - (Sum of values in LCP Array)

# SETTER'S SOLUTION

Can be found here.

# TESTER'S SOLUTION

Can be found here.

This question is marked "community wiki".

2.4k128183169
accept rate: 14%

2

This was the problem I most wanted to solve during the contest... Nice editorial @gamabunta...

(20 Jul '13, 18:04) 4★

 3 You can also build a suffix automaton(DAWG) and just count the number of paths from the source node, by using DP or recursively with memoization. answered 20 Jul '13, 18:51 575●1●5●11 accept rate: 13% @c0d3junki3 : could you please sujjest some good article on suffix automation ? (20 Jul '13, 19:50) 1 http://cbse.soe.ucsc.edu/sites/default/files/smallest_automaton1985.pdf And the source code: http://e-maxx.ru/algo/suffix_automata These are the two things i used to learn it. (20 Jul '13, 22:56) Thanks :) One small request i have few doubt on emaxx.ru algorithms. could we talk some where else or give me your email id.mine is sandeep.masum4685@gmail.com (20 Jul '13, 23:12)
 1 The Setter's solution is based on Suffix Automation and not Suffix arrays! answered 21 Jul '13, 19:10 4★ash1794 225●2●4●7 accept rate: 11%
 1 I have implemented O(n*log^2n) solution based on suffix array and kasai's algorithm somewhat similar to given on geeks for geeks yet getting WA. Can somebody help me figure out bug or can somebody provide additional test cases. HERE is the link to my code. Thanks in advance ! answered 07 Dec '16, 20:01 2★qazasd 11●1 accept rate: 0% Answering one and a half year later , hope this helps , Change the code to this:::  long long int ans=0; for(i=0;i
 0 I still can't understand how to use suffix array to solve the problem. I tried to apply what's in the editorial but in vane. Can any1 provide an example and explain it step by step? answered 31 Jul '13, 18:33 4★bondoc73 106●2●4●10 accept rate: 10% Thanks a lot no need for the example. I understand it now. (31 Jul '13, 23:41) bondoc734★
 0 Doesn't the tester's solution builds suffix array in O(n) time? answered 09 Aug '13, 19:26 99●3●5●9 accept rate: 0%
 0 I am a bit confused. After sorting the arrays on the basis of the first 4 characters how come sissippi is placed over sippi. answered 07 Dec '14, 17:19 4★foolan 35●1●6 accept rate: 0%
 0 How do we get to know that one has to use suffix array for a particular problem ? Any help would be appreciated :) answered 09 May '18, 18:31 318●1●10 accept rate: 1%
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×15,682
×2,595
×643
×91
×36
×20

question asked: 20 Jul '13, 14:33

question was seen: 8,650 times

last updated: 09 May '18, 19:37