CHSTR - Editorial

shouldn’t cnt[1] be incremented by 3 ? Since {a,aba,ababa} all can belong in that group.

can anyone please explain me the significance of these lines in the authors code?

for(auto x : zz) {
cnt[0]++;
cnt[x+1]–;
}

Cool solution!

For everyone out there, who are upset with the editorial of this problem. :slight_smile:

so, I assume you all have understood the working of the Z algorithm.
Let us see how the cnt array is made.

Note that the Z algorithm is applied for each suffix of the given string.
Here, the string is “ababa”.
So, the Z algorithm will be applied over “ababa”, “baba”, “aba”, “ba” and finally “a”.

In Iteration 1: (when the suffix is “ababa”):

Clearly, the Z array we get in this case is:

z[0]=5, z[1]=0, z[2]=3, z[3]=0, z[4]=1.

Once this array is computed, we see that there are 3 entries in the above array which are >= 1.
Similarly, 2 entries >=2, 2 entries>=3, 1 entry >=4 and 1 entry >=5.
Hence, for this iteration, the cnt array will be calculated as:

Since we have: 3 entries >=1 , so, increment cnt[3].

Similarly,
2 entries >=2 , so, increment cnt[2].

2 entries >= 3 , so, increment cnt[2].

1 entries >=4, so, increment cnt[1].

1 entries >=5, so, increment cnt[1].

Thus, after this iteration, our cnt array is:
cnt[1]=2

cnt[2]=2

cnt[3]=1

cnt[4]=0

cnt[5]=0

In iteration second: (when the suffix is “baba”)

The Z array is z[0]=4, z[1]=0, z[2]=2, z[3]=0, z[4]=0.

After this iteration, cnt array becomes:

cnt[1]=4

cnt[2]=4

cnt[3]=1

cnt[4]=0

cnt[5]=0

Now, After all 5 iterations, the final cnt array is

cnt[1]=9

cnt[2]=5

cnt[3]=1

cnt[4]=0

cnt[5]=0

Now, this is the cumulative cnt array. It means, cnt[1] represents how many substring occurs at least 1 time. So, to get exactly 1 time, we do: cnt[1]-cnt[2] (at least 1 time - at least 2 time).

Thus, we get:

cnt[1]=cnt[1]-cnt[2]=9-5=4

cnt[2]=5-1=4

cnt[3]=1-0=1

cnt[4]=0-0=0

cnt[5]=0, to get our final cnt array.

See my solution for implementation. CodeChef: Practical coding for everyone

13 Likes

thank u very much @sumitjha4321 nice solution…

Can anyone please explain what does this mean ->

" From the above array we can conclude that we have 1 substring which can be choosen atleast 3 times. How? Because there are 3 entries in the above array that are greater than or equal to 1. Similarly we can deduce for 2 (2 times), 3 (2 times), 4 (1 time ) and 5(1 time). So we increment cnt[3] once, cnt[2] twice and cnt[1] twice. " ?

@glow I have tried to explain how the cnt array is being made. please see my comment in the 1st page of this editorial.

Yeah bro i got that but i was confused how this happening ->

suppose if we have 4 entries greater than 1 than why do we increment cnt[1], i was not getting why does this mean that we have another substring which is repeated atleast once ??

OR

if we have 2 entries greater than 2 than how come we know that we founded another substring which is repeated greater than equal 2 times ?

Thanks for ur detailed explaination. :smiley:

For anyone getting TLE with suffix tries on subtask 3, the easiest way to work around this is to make them static. Instead of representing parent->child edges via a pointer array, we will use an integer array.

We begin by indexing the root as 0, and each consecutive node will get its own index. We can observe that the trie will have at most maxsz = 1 + n*(n+1)/2 nodes as that is the sum of lengths of its sufixes and +1 for the root node(this is a trivial upper bound). All that’s left is to keep an array T[maxsz][26] such that T[i] represents the list of indices of the children of node i, and C[maxsz], where C[i] represents the number of sufixes that „pass“ through node i in the trie(we don’t care about C[0]).

All that is left is the formation of the trie. We will do this by reseting the index counter to 0. When we form a new node i we fill T[i] with -1, set C[i] to 0 and increment the index counter by 1. Every time we pass node i during insertion we increment C[i] by 1. The reason we don’t just fill T and C with -1 at the start is that we are avoing unnecessary work. Make sure to use memset for these fill operations as the ammount of work done is quite large.

Another idea which might speed up your implementation(if you haven’t already done so) is this method of calculating modular inverses 1 to n modulo some prime number:

modinv[1] = 1;
for(int i = 2; i < MAXN; ++i){
    modinv[i] = MOD - mul(MOD / i, modinv[MOD % i]);
}

mul(a, b) is just the modular multiplication function( mul(a, b) = (a * b) % MOD ). I don’t see this being very significant, though.

My (terrible) implementation: CodeChef: Practical coding for everyone

Can someone please explain me how to construct cnt[] array using suffix array? Any help will be highly appreciated. I am stuck on this Z array explanation for too long. I cant understand it intuitively.

@shiva318 I can explain you concept of Z array and cnt array. Read this carefully Z array i.e. Z[i] is the length of the largest prefix which begins at i and is also among one of the prefix of the whole array. This means say you have a string “appleap” then value of Z array will be {7,0,0,0,0,2,0}. Now its explananation—>

Z[0] = 7 because if you look at prefix of string [0,6] the whole string is prefix of original string

Z[1] = 0 you cant find any prefix of the string s[1,6] which is prefix of original string {“appleap”}

similarly all are zero but when you look at Z[5] it is 2 because string beginning at Z[5] till Z[6] matches the original given string “appleap”. I hope you understood the Z array part. This Z array can be calculated in O(n) complexity but i recommend you to try this in O(n^2) complexity first which would make you easy to understand.

Cnt array calculation---->

Remember that this problem requires Z array computation in O(n) because you dont need to calculate only one Z array but Z array for every suffix of the original string. The example I took —>

appleap ppleap pleap leap eap ap p

And then for each of these suffixes you need to calculate cnt array. The method to calculate cnt array is that when you get a z array for a suffix then—>

say z array is {1,2,3,4} (arbitrary can be wrong )

then find out total entries greater than equal to 1 say it comes here to be 4 so increment cnt[4] by 1, now check how many values are greater than or equal to 2 here it is 3 so increment cnt[3] by 1. Do this for all values till length n i.e. values greater than or equal to n.

In this way you can find cnt[] array. You need to find this cnt[] array for every suffix of “appleap”. Since it was a calculation part it can be directly implemented. Now this cnt array carries information about total substrings which repeat more than i times at each cnt[i]. So to calculate strings which repeated exactly i times answer is cnt[i]-cnt[i+1]. Then you can use combinatorics to calculate your answer easily. For more help comment.

@apptica

Thank You for your response…:slight_smile: I understand what Z array denotes and I read how to implement it in O(n). However, the part in which I have problem is this: "The method to calculate cnt array is that when you get a z array for a suffix then—>

say z array is {1,2,3,4} (arbitrary can be wrong )

then find out total entries greater than equal to 1 say it comes here to be 4 so increment cnt[4] by 1, now check how many values are greater than or equal to 2 here it is 3 so increment cnt[3] by 1. Do this for all values till length n i.e. values greater than or equal to n.". I can understand the method but what I do not understand is why this works. I can not understand it intuitively. I know that this method works but just cant understand why/how, I mean the reason behind it.

@shiva318 The reason why it works is say i have a string as “appleaa”. Now its Z array will be 7 0 0 0 0 1 1

Total entries >=1 are 3 it means that there are 3 strings of each length >=1 which are also prefix of the original array. This means that whatever the case may be at least their first character would have been same as that of the original string. It means we got a string which repeated 3 times. Now since you are calculating this for every prefix viz. applea pplea plea lea ea a…then you will pick each and every case. Now say we are talking about total length>=2 here it is only 1 so we are incrementing cnt[1] because total strings with length >=2 which also matched the prefix are only one. it means a string repeated just one time. Hope it helps…Re comment for more help…also remember that this is a cumulative calculation

1 Like

I tried to break the code but it’s not happening, what should i do next to prevent from it? ENOSITE

yes, i have solved with hashing using rabin karp algo
see my solution CodeChef: Practical coding for everyone
the code is not quite readable but you can understand it.

1 Like

@raja44ever need a little bit explanation of your approach. How did you decided the hash function. Please help ?

I tried the same approach , but TLE.
CodeChef: Practical coding for everyone.
Not able to understand until now what was the reason ?

i counted the no of equal strings for every length i form 1 to n using hashing(map is used to count equal strings) and save them in array a(same as cnt in editorial above)…then i used two optimizations
1.) for every k, i calcalated the answer by considering only non zero cnt[j] where k<=j<=n.
2.) second if for any length i there is no two equal strings than we stop counting cause there is also no equal strings for the length greater than i

Thanks for your great reply understood a lot. But how did you determined a hash function with no collisions.

I tried 3 approaches.

  1. Trie : It would TLE all the time even after many optimizations in the last sub task with O(N^2).
  2. Hashing : I also tried un-ordered mapping of sub strings along with dp to get O(N^2) but this also TLEd on the last sub task.
  3. Suffix Array : Finally, I used SA with the complimentary LCP array to get 100 pts with O(N^2).

So what I could make out was that even though all of them were O(N^2), the constants and over heads in the first two approaches should have been high for the relatively strict time limit.

1 Like