A tutorial on Suffix Arrays

Now, we can call quick-sort and sort the suffixes according to their first 4 characters by using the 2-tuples we constructed above! The result would be

can anyone please explain the above line???qsort is called in which tuple…

1 Like

how tuples r used for sorting please explain???

1 Like

Really,easy to understand from this tutorial!!!:slight_smile:

16 Likes

Now, we can call quick-sort and sort the suffixes according to their first
4 characters by using the 2-tuples we
constructed above! The result would
be:

6 6: sissippi
7 3:
sippi
8 5: ssissippi
9 2: ssippi

Can anyone explain me the result of above step. How are the suffixes sorted according to first 4 characters after calling quick sort.
I mean, how is ‘sissippi’ is having a lower sort-index then ‘sippi’ and simmilarly how is ‘ssissippi’ having a lower sort-index than ‘ssippi’ ?
I think the correct order should be:

6 6: sippi
7 3:
sissippi
8 5: ssippi
9 2: ssissippi

2 Likes

In the naive approach, shouldn’t the loop go from i=0 to s.size() rather than i=0 to s.size()-1 ?

1 Like

well explained, but I think there should be some correction in the table given below:
Sort-Index Suffix-Index

0       10: i
1        7: ippi
2        1: ississippi
2        4: issippi
3        0: mississippi
4        9: pi
5        8: ppi
6        3: sissippi
7        6: sippi
8        2: ssissippi
9        5: ssippi
till (compared)4th character,  8 sort_index should be "ssippi" and 9th should "ssissippi" same in 6 & 7 sort_index
1 Like

Shouldn’t it be
“We can use the SortIndex array we constructed above to find the Longest Common Prefix, between any two suffixes”
instead of
“We can use the SortIndex array we constructed above to find the Longest Common Prefix, between any two prefixes.”

1 Like

I am a beginner. Very well written sir. It was a deep and helpful insight.

1 Like

I think the final answer given is wrong. It should be

sort-Index Suffix-Index

0       10: i
1        7: ippi
2        1: ississippi
2        4: issippi
3        0: mississippi
4        9: pi
5        8: ppi
6        6: sippi
7        3: sissippi
8        5: ssippi
9        2: ssissippi

This was awesome!
Thanks Bruno.

while sorting the suffixes according to the first 4 characters, why is “3” not assigned in the second tuple ??

shouldn’t in the naive approach

for(int i = 0; i < s.size();i++)
{
    m[s.substr(i,s.size()-i)] = i;
    v.push_back(s.substr(i,s.size()-i));
}

should be:

for(int i = 0; i < s.size();i++)
{
    m[s.substr(i,s.size()-1)] = i;
    v.push_back(s.substr(i,s.size()-1));
}

???

notice the size() - 1 instead of size()-i

very good explanation…

Another nice tutorial :slight_smile:

Really Helpful! Cheers :smiley:

I am getting wrong suffix array for input
ebbacb

Actual output is

5 3 2 0 4 1

While It should be

3 5 2 1 4 0

The N((log N)^2) says “we can use the property that all the strings are related” . But it is not very clear to me that what we are using , can’t we use the same thing for normal string comparison ?

@gamabunta : Will you please help me in explaining that ?

My implementation without the LCP array :-

Did anyone actually run the C++ SortIndex code? Did it actually give you the correct answer?

I find this line is a bit confusion as after P[stp][previous-position] = i which is a new position after sorting, we lose
track of the actual value that we initialized at the earlier step.

P[stp][L[i].p] =i> 0 && L[i].nr[0]==L[i-1].nr[0] && L[i].nr[1] == L[i- 1].nr[1] ? P[stp][L[i-1].p] : i;

Hence, when the loop starts the next round, all P[stp-1] have is no longer any values related to the original string and sub-strings.
I am puzzling now how this line could make things work they way we think it should be.

Furthermore, the C++ code is exactly in this paper: http://web.stanford.edu/class/cs97si/suffix-array.pdf.

Very well explained sir. It will be very helpful for beginners like me.