KILLKTH - Editorial

melfice · January 6, 2018, 4:52pm

PROBLEM LINK:

Practice
Contest

Author: Satyam Shubham
Tester: Alexey Zayakin
Editorialist: Oleksandr Kulkov

DIFFICULTY:

MEDIUM-HARD

PREREQUISITES:

Suffix tree

PROBLEM:

You’re given string s. String t is obtained by concatenating all of s substrings in lexicographical order. You have to answer queries of finding k^{th} letter in t in online.

QUICK EXPLANATION

For each vertex of suffix tree calculate how much letters will it take to write down all substrings up to the one ending in this vertex. Use binary search over these values to find substring from which k^{th} letter is taken.

EXPLANATION:

When it comes to substrings it’s only natural to use some suffix structure, namely suffix tree. Let’s recall how to get all substrings of s in lexicographical order given suffix tree of s. If it is not compressed then you can just walk over all its vertices in lexicographical order and the next substring will be the one corresponding to path from the root to the current vertex.

Now consider compressed suffix tree. Let’s sort all vertices of the tree in lexicographical order and calculate for each explicit vertex v how many letters we will obtain if we concatenate all substrings up to the one corresponding to this vertex and denote it by cnt_v. Obviously, cnt_{root}=0. Now denote explicit parent of vertex v by par_v, vertex previous in lexicographical order to v by pre_v and length of substring corresponding to vertex v by len_v, then

cnt_v= cnt_{pre_v}+(len_{par_v}+1)+(len_{par_v}+2)+\dots+(len_v)

cnt_v = cnt_{pre_v} + \dfrac{(len_v + len_{par_v}+1)(len_v-len_{par_v})}{2}

After we calculated this, we are ready to answer queries. If we have number k in the query, we should find such vertex v that

cnt_{pre_v} < k \leq cnt_v

That will mean that substring which holds k^{th} letter is on the edge between par_v and v. Now we can find particular length of such substring. It will be equal to m such that

\sum\limits_{i=len_{par_v} + 1}^{m-1} i< k - cnt_{pre_v}\leq \sum\limits_{i=len_{par_v}+1}^{m} i

We know where can we find any occurence of strings from edge in s and we know length of particular string. Now only thing left to do for us is to output letter number \left(k-cnt_{pre_v}-\sum\limits_{i=len_{par_v}+1}^{m-1} i\right) of this string.

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.
Tester’s solution can be found here.

RELATED PROBLEMS:

kaushal101 · January 15, 2018, 9:45pm

This question had weak test cases and many submissions had passed for the same WEAK TEST CASES IN KILLKTH PROBLEM IN JANUARY LONG - general - CodeChef Discuss

triveni · January 16, 2018, 4:45am

Now that everyone knows testcases of this problem are super weak. I would like to add one more weakness to the list. The value of K was never big enough. That means even binary search was needed to find the answer for K. Simply iterate through the sorted suffixes till you find the Kth character. On an average length of a suffix will be 10^5. You do not need to iterate on many suffixes.

For the author @killjee : Make sure that when you generate random numbers the seed is correctly set - large enough. In my machine (also probably in codechef too) it is by default set to be MAX_INTEGER. So, taking rand() % K - where K is very large of the order 1e18 or so - is not correct. Better way is to use - (rand() * (long long)rand()) % K. Or use your own made rng.

gomu · January 16, 2018, 12:40pm

I used a suffix array based approach with precalculating number of substrings starting with particular index and used binary search along with it. The time complexity for building the suffix array is O(n*(logn)^2) and time to answer one query is O(2*logn). But it gave WA for all the subtasks. My Solution. I manually tested and generated the hidden string and it was correct for a lot of cases. Can anyone help me in why this submission gave WA?
If possible try to give test cases in which this fails. Thank you.

pk301 · January 24, 2018, 7:45am

can anyone please explain me how the cntV is calculated?

vivek_1998299 · January 24, 2018, 1:50pm

@pk301

Ohk , consider the suffix tree of string abcd,so there will be an edge “abcd” between root and some vertex(V1).

so this could be conidered as “a”,“ab”,“abc”,“abcd”
so cntV1=1+2+3+4=10

Similarly u can calculate if the parent of a vertex is not root,
lets say there is an edge “efg” between v1 and some vertex v2.

so the substrings in lexicographic order would be:
“abcde”,“abcdef”,“abcdefg”

so cntV2= cntV1 + 5+6+7 = 28

(Note this was the case when there was 1 leaf inside its subtree(ie 1 occurrence) if there were more then u would have to do something like this,
lets say occurrence =2,so for v2 we’ll have “abcde”,“abcde”,“abcdef”,“abcdef”,“abcdefg”,“abcdefg”

ie cntV2=cntV1+occ*(len+…)
)

pk301 · January 26, 2018, 7:22pm

@author problem is awesome but please add some extra test cases. Test cases are so weak that completely wrong solutions got accepted for 80 points! I myself wasted 2-3 hours thinking that there is some edge cases that i’m missing but afterwards i found that my approach was completely wrong. So, please add some strong cases as it will help others who solve this nice problem later

spj_29 · February 2, 2018, 2:42am

Can anyone explain the solution of this problem using suffix arrays?
The testers solution uses suffix arrays but i am unable to understand it.

neeraj745 · March 31, 2018, 4:12am

can someone please explain tester’s solution?

tmwilliamlin · January 18, 2018, 6:15am

The tester’s solution is suffix array based, you can try to stress test against it.

rj25 · January 18, 2018, 10:49pm

Can anyone explain with simple example how is len_v and cnt_v obtained ?

pk301 · January 24, 2018, 7:56pm

thanks for this! Can you please answer my one more doubt : After calculating cntV for all the nodes, How to land on correct node efficiently at time of query? I have thought of many ways but got stuck at each of them. One of the ways that i thought is we first look for “in which subtree of root our K exist” Then suppose we get some ‘ch’ and then we go and found at which path this ‘K’ exist and then simply traverse up the tree(atmost logN nodes).

But I got stuck at second step(at which path(from root to leaf) the K exist)!
Can you please help! Or just tell me your approach

vivek_1998299 · January 24, 2018, 8:33pm

I’ll tell what i did.

See u can store this thing in an array,like while doing dfs store in an array cntV ,(start,end) index of edge and parentLength(for v2 its 4(“abcd”))

like in prev example when i reach v1
arr[0].cnt=10 , arr[0].stEnd=(st,End)=(0,3) ,arr[0].parentLength=0

when reach v2
arr[1].cnt=28 , arr[1].stEnd=(4,6) ,arr[0].parentLength=4

so now lets say i get a k,so i can simply binary search in arr.cnt,so i’ll know which index it lies(like for k=13 it lies in arr[1])

Once i get that, I know ans lies in this block

vivek_1998299 · January 24, 2018, 8:34pm

Now I can subtract arr[index-1].cnt (ie K-=arr[index-1].cnt ),this gives me the position at which k lies in this block(index)

Now the length of substrings for v2 would be 4,5,6 so again i wanna know which substring does it belong

So this is an AP,and i have to find n such that Sn>=k(I can get it in O(1) using formula:-n/2(2a+(n-1) * d), once i Know that i can get the character using (st,end)

pk301 · January 24, 2018, 8:55pm

Yes, that’s what i think first but tell me one thing at every node we are storing no. of substring upto this. So, suppose we have a tree branch ab -> cd (it has two branch ‘e’ and ‘f’) Now, at ‘ab’ we have 2, at ‘cd’(4) at e(5) and at f(5). Now, when we dfs we have (2, 4, 5, 6) but my doubt is when we are taking ‘abcdf’ we have actually taken ‘abcde’. so, can’t we subtract that(‘abcde’ i.e. 1) from k and search for remaining k?

vivek_1998299 · January 24, 2018, 9:20pm

I dont think i get ur question properly.

Firstly i never stored no. of substrings (i stored cntV and all)

Secondly,if i got ur point correctly are u saying doing dfs at each queries,that will be O(n) per query and would time out

pk301 · January 24, 2018, 11:32pm

“if we concatenate all substrings up to the one corresponding to this vertex and denote it by cntv” I mean this. We are storing cntV and let us suppose a branch ab -> cd ,(cd -> e and cd -> f) is there in suffix tree and we have cntV for all (2(ab), 9(cd), 14(e), 14(f)). Now, after dfs, array -> (2, 9, 14, 14). Now, my question is we are just finding index with value < k in this array but before visiting 2nd 14(f) we have visited 14(e) so, can’t we subtract this substring length from k and search for remaining k?

vivek_1998299 · January 25, 2018, 1:14am

I dont understand this line “can’t we subtract this substring length from k and search for remaining k?” ,why should i subtract substring length,however u can subtract sum of lengths of all substring with “abcde” as prefix and search for remaining k

pk301 · January 25, 2018, 8:28am

Sorry, I mean that only : “subtract sum of lengths of all substring with “abcde” as prefix and search for remaining k”. But in your solution you are simply searching on the array?

And second thing it may happen that the array is not sorted then how does binary search works there?

vivek_1998299 · January 25, 2018, 10:21am

Ooh sorry ,i actually maintained a globl count sort of thing ,and at each node i saved cnt which was just equal to (len)+(len+1)+… and not cntParent + len+… . So while dfs i did this :- assign array indx range : count - (count + cnt),and count=count+cnt,index=index+1

You can refer my solution(as i too have forgotten most of the things(sorry)) : CodeChef: Practical coding for everyone