wa in SSTORY suffix arrays

coda · March 18, 2014, 3:56pm

i kept getting wa in suffix arrays could the author @kuruma or anyone provide good test cases

daiver19 · March 18, 2014, 4:13pm

I’m not sure if that problem can be solved with suffix arrays. The problem is that with suffix array/lcp approach you may miss the leftmost matching. Imagine lcp array k…(k + n)…k where the first k corresponds to the maximum common substring and all other lcp entries correspond to only one string. With simple approach you’ll treat the position in second string as suffix array index around the first k, however you should check all further lcp entries. Probably you can find minimum fast with segment tree, however I’m not sure if it passes the time limit. I’ve used suffix automaton for solving these.

vineetpaliwal · March 18, 2014, 4:17pm

@coda : Try this case

babaazzzzyy

badyybac

The suffix array will contain baa… (From 1st string ) , baba… ( from first string ) , bac ( from second string ) , bad from second string .

So if you are examining consecutive entries of SA then you will find a match at “baba” and “bac” and find the index of “ba” as 7 in second string , even though its actually at index 1 also .

Its likely that you may output “yy” instead of “ba” .

vineetpaliwal · March 18, 2014, 4:31pm

@daiver19 : I have solved this problem using “Suffix Arrays” .

coda · March 18, 2014, 4:38pm

thax @vineeetpaliwal u were correct wish i could figure it out before
thax a lot

i rectified the aecd acad case during the contest couldnt figure out this one

imranraad · March 18, 2014, 4:43pm

I solved the problem with suffix array with the help of hashing. Though I didn’t figure out where my suffix array failed, I got ac using hash. My code with only suffix array also failed to the case from @vineetpaliwal.

tinchy_stryder · March 18, 2014, 4:58pm

My code handles the case posted by @vineetpaliwal. But still, got WA.

Here’s the link to my code: wct7zH - Online C++ Compiler & Debugging Tool - Ideone.com

Can someone please help me find where I’m making the error?

kuruma · March 18, 2014, 5:08pm

Hello @all,

As @vineetpaliwal pointed out (and as it was pointed out to me during the contest) the issue with Suffix Arrays is that only examining consecutive characters might yield WA on some test cases (I only knew how to apply Suffix Automaton to solve this, as I read it on CLRS book), which the Suffix Automaton didn’t detect (might be related with some properties of Suffix Arrays).

Note that I’ve only read about Suffix Arrays while setting this problem and I still need to learn it properly myself

Hopefully setting problems harder for me might push my own boundaries further

Thanks,

Bruno

rishul_nsit · March 18, 2014, 5:37pm

Hi everyone

I used suffix arrays to solve this question and my solution sustained all retests successfully.
The algorithm is:

Construct the suffix array sorted lexicographically.
Construct the lcp array from the suffix array’s consecutive entries.
Mark the entries in lcp array whose suffixes start from different strings.
Find the max length of possible lcs (longest common substring) from the lcp array (equal to the max value of marked entries in lcp array).
Traverse over lcp array again and for all marked entries equal to max value, construct the lcs from the suffix array entries. Insert this lcs into a set. This is a set of possible lcs.
Now traverse over the second string, generate all substrings of length equal to max possible lcs length ( This can be done in O(n) ), search them in the set of possible lcs and break as soon as you find a match. The first match is your answer. This is the lcs which occurs first in the second string.

Link to the solution: CodeChef: Practical coding for everyone

If number of possible lcs are ‘k’, then the complexity of my solution is O(nlgk). Basically any possible mistakes are avoided because all possible lcs are generated and the one which occurs first in second string is printed always.

Learnt a lot from this question, thanks to @kuruma

randomusername · March 18, 2014, 5:44pm

I must say I enjoyed this problem, despite it’s annoying regrades. I used SA+LCP, and then realized that finding the leftmost LCS in the second string isn’t that easy. So after a day or two, I found a solution. It might be an overkill, but I had to implement three more necessary structures:

L : L[i]=MAXj, such that j < i and LCP[j]<LCP[i]

R : R[i]=MINj, such that j > i and LCP[j]<LCP[i]

(basically we want to find first element smaller than LCP[i] on both sides of the array)

CS (a structure that is used to retrieve RMaxQ in O(1), segment trees could work too)

Then I noticed that for every pair of consecutive suffixes in SA we can find the “best” (first in the second string) appearance of their common prefix in the second string using these structures. We just do RMaxQ(L[i]+1,R[i]).

I can elaborate this even further and show an example, if anyone is interested, but please share some solutions that use SA+LCP but are different than mine, I’m curious.

Cheers

jangwa · March 18, 2014, 4:19pm

Sounds good … Can you please give me link or idea about suffix automaton?

rishul_nsit · March 18, 2014, 4:21pm

@vineetpaliwal isn’t the answer to this test case “aba” and not “ba”?

jangwa · March 18, 2014, 4:23pm

What should be the answer ba or aba ?

betlista · March 18, 2014, 4:25pm

you have to find the longest…

kcahdog · March 18, 2014, 4:27pm

Should be ‘aba’

vineetpaliwal · March 18, 2014, 4:30pm

@all : Have edited to remove an “a” from second string . It was just unintentionally there .

jangwa · March 18, 2014, 4:34pm

okay got it

vineetpaliwal · March 18, 2014, 4:40pm

@coda : Please upvote and accept .

daiver19 · March 18, 2014, 4:43pm

Sorry, but I don’t have any articles in english. You may try just to search for it. Also suffix trees should definitely solve this, however they’re harder to implement in my opinion.

coda · March 18, 2014, 4:50pm

sorry @vineetpaliwal i didnt know abt the “accept” thing and also the “award point” thing thax