CHSTR - Editorial

I haven’t heard about the trie DS till now, but I think I used a similar approach, and after a lot of submissions, I finally got AC 100pts.

I don’t know anything about suffix arrays either, I guess I’ll read about them now.

Basically I went through the entire string and saved the indices of all a’s,b’s,c’s…z’s separately in vectors.
Then, I would pick any one vector, and for each index look at index+1, and again store all indexes of a’s,b’s,c’s…z’s. If the number of alphabets is more than 1, only then will it be used, so whenever any such vector had more than 1 alphabet, then I would make an entry in a multiset of the number of alphabets.

Example: You start with the vector of all 1st a’s. Now you look at index+1 for each a, and find say 3 a’s 2 b’s 1 c
this means you have 3 strings of the form ‘aa’ , two of the form ‘ab’ and 1 of the form ‘ac’
The contribution we care about is only from where we have 3 equal strings ‘aa’ and 2 equal ‘ab’, we don’t care about ac,
further, we don’t need to recursively use the vector containing only this one c, because it will only lead to a string which will never match any other, as we only have 1 ‘ac’, we will always have at most 1 ‘aca’,‘acb’,‘acc’ etc.

Using this approach, I was able to calculate the required multiset, in which there were entries such as 2 2 2 3 3 4 4 etc, denoting the number of equal strings of various types.

I further shortened this to a simple vector which had pair of values, the actual number and number of occurrences, eg: in 2 2 2 3 4 4
we have entries for vector as pairs
(2 3)
(3 1)
(4 2)

As I had used a multiset, these values were already sorted, even in the created vector.

After this I simply used the logic of inverse modulo and saving inverse factorials to compute nCr. I later precomputed all values till 5000! % MOD and stored them in an array, and then when required computed r, n-r inverse factorials if they weren’t already computed and stored.

I was also sorting the given queries, so as to answer the same query value without having to recompute, like if k=3 is given 5 times in all, then it will be computed once, and then the result will be reused.

I was getting AC in all tasks except 1 of the last subtask.

To my surprise, in that task, if I would just terminate the program before the actual nCr calculating part, and only let the entire multiset and vector creation take place, it was taking a running time of 0.08 s, but after the nCr calc, it was giving TLE.

I tried using putchar_unlocked and using getchar for string input instead of scanf/printf, but the running times were almost exactly the same.

For a little more shocking experience, in one of the statements of the nCr calculating function, there were 3 terms being multiplied, and if I would just comment out 1 of them, it was giving WA with a running time of about 0.8s but no TLE.

Anyway, after a lot of tweaking around, I finally managed to get AC by removing 1 % operation in a certain loop. I was stunned that such a small change got my submission accepted! but well, at least it did get submitted finally.

Here’s my final solution CodeChef: Practical coding for everyone

Editorials are supposed to provide an opportunity for us to learn. Not point to a third party link! Wake up guys!

@antoniuk1, Sir, your Z transform function should fill the vector z till ‘n’, as given in the example of editorial, also the program is giving compilation error :slight_smile:

can anyone please explain how to use the Z-array to calculate the cnt array …! thanks

Well, from the editorial I could only understand that Z function was needed to solve this problem.
How I can use Z function? Still no idea.
Things are not properly explained, perhaps editorial is written for people who already solved it?

Just to point out, it is mentioned in the editorial that Z function is calculated for all suffixes of given string, whereas in the Author’s solution I found this:

for(i=0; i<s.size(); i++){
   vector<int> zz = z_function(s.substr(i, n-i));
}

I hope, quality of editorials will improve on Codechef in near future.

2 Likes

I did this via a counting-sort type algorithm. It is O(n^2), but in practice it works much better.
Here is the solution:
http://www.codechef.com/viewsolution/7138349

Basically, we start sorting by the first character. Then we get some distinct regions
For each region, we sort by the second character etc.

Could someone explain more clearly how cnt is calculated from z? Thanks in advance.

shouldn’t cnt[1] be incremented by 3 ? Since {a,aba,ababa} all can belong in that group.

can anyone please explain me the significance of these lines in the authors code?

for(auto x : zz) {
cnt[0]++;
cnt[x+1]–;
}

Cool solution!

For everyone out there, who are upset with the editorial of this problem. :slight_smile:

so, I assume you all have understood the working of the Z algorithm.
Let us see how the cnt array is made.

Note that the Z algorithm is applied for each suffix of the given string.
Here, the string is “ababa”.
So, the Z algorithm will be applied over “ababa”, “baba”, “aba”, “ba” and finally “a”.

In Iteration 1: (when the suffix is “ababa”):

Clearly, the Z array we get in this case is:

z[0]=5, z[1]=0, z[2]=3, z[3]=0, z[4]=1.

Once this array is computed, we see that there are 3 entries in the above array which are >= 1.
Similarly, 2 entries >=2, 2 entries>=3, 1 entry >=4 and 1 entry >=5.
Hence, for this iteration, the cnt array will be calculated as:

Since we have: 3 entries >=1 , so, increment cnt[3].

Similarly,
2 entries >=2 , so, increment cnt[2].

2 entries >= 3 , so, increment cnt[2].

1 entries >=4, so, increment cnt[1].

1 entries >=5, so, increment cnt[1].

Thus, after this iteration, our cnt array is:
cnt[1]=2

cnt[2]=2

cnt[3]=1

cnt[4]=0

cnt[5]=0

In iteration second: (when the suffix is “baba”)

The Z array is z[0]=4, z[1]=0, z[2]=2, z[3]=0, z[4]=0.

After this iteration, cnt array becomes:

cnt[1]=4

cnt[2]=4

cnt[3]=1

cnt[4]=0

cnt[5]=0

Now, After all 5 iterations, the final cnt array is

cnt[1]=9

cnt[2]=5

cnt[3]=1

cnt[4]=0

cnt[5]=0

Now, this is the cumulative cnt array. It means, cnt[1] represents how many substring occurs at least 1 time. So, to get exactly 1 time, we do: cnt[1]-cnt[2] (at least 1 time - at least 2 time).

Thus, we get:

cnt[1]=cnt[1]-cnt[2]=9-5=4

cnt[2]=5-1=4

cnt[3]=1-0=1

cnt[4]=0-0=0

cnt[5]=0, to get our final cnt array.

See my solution for implementation. CodeChef: Practical coding for everyone

13 Likes

thank u very much @sumitjha4321 nice solution…

Can anyone please explain what does this mean ->

" From the above array we can conclude that we have 1 substring which can be choosen atleast 3 times. How? Because there are 3 entries in the above array that are greater than or equal to 1. Similarly we can deduce for 2 (2 times), 3 (2 times), 4 (1 time ) and 5(1 time). So we increment cnt[3] once, cnt[2] twice and cnt[1] twice. " ?

@glow I have tried to explain how the cnt array is being made. please see my comment in the 1st page of this editorial.

Yeah bro i got that but i was confused how this happening ->

suppose if we have 4 entries greater than 1 than why do we increment cnt[1], i was not getting why does this mean that we have another substring which is repeated atleast once ??

OR

if we have 2 entries greater than 2 than how come we know that we founded another substring which is repeated greater than equal 2 times ?

Thanks for ur detailed explaination. :smiley:

For anyone getting TLE with suffix tries on subtask 3, the easiest way to work around this is to make them static. Instead of representing parent->child edges via a pointer array, we will use an integer array.

We begin by indexing the root as 0, and each consecutive node will get its own index. We can observe that the trie will have at most maxsz = 1 + n*(n+1)/2 nodes as that is the sum of lengths of its sufixes and +1 for the root node(this is a trivial upper bound). All that’s left is to keep an array T[maxsz][26] such that T[i] represents the list of indices of the children of node i, and C[maxsz], where C[i] represents the number of sufixes that „pass“ through node i in the trie(we don’t care about C[0]).

All that is left is the formation of the trie. We will do this by reseting the index counter to 0. When we form a new node i we fill T[i] with -1, set C[i] to 0 and increment the index counter by 1. Every time we pass node i during insertion we increment C[i] by 1. The reason we don’t just fill T and C with -1 at the start is that we are avoing unnecessary work. Make sure to use memset for these fill operations as the ammount of work done is quite large.

Another idea which might speed up your implementation(if you haven’t already done so) is this method of calculating modular inverses 1 to n modulo some prime number:

modinv[1] = 1;
for(int i = 2; i < MAXN; ++i){
    modinv[i] = MOD - mul(MOD / i, modinv[MOD % i]);
}

mul(a, b) is just the modular multiplication function( mul(a, b) = (a * b) % MOD ). I don’t see this being very significant, though.

My (terrible) implementation: CodeChef: Practical coding for everyone

Can someone please explain me how to construct cnt[] array using suffix array? Any help will be highly appreciated. I am stuck on this Z array explanation for too long. I cant understand it intuitively.

@shiva318 I can explain you concept of Z array and cnt array. Read this carefully Z array i.e. Z[i] is the length of the largest prefix which begins at i and is also among one of the prefix of the whole array. This means say you have a string “appleap” then value of Z array will be {7,0,0,0,0,2,0}. Now its explananation—>

Z[0] = 7 because if you look at prefix of string [0,6] the whole string is prefix of original string

Z[1] = 0 you cant find any prefix of the string s[1,6] which is prefix of original string {“appleap”}

similarly all are zero but when you look at Z[5] it is 2 because string beginning at Z[5] till Z[6] matches the original given string “appleap”. I hope you understood the Z array part. This Z array can be calculated in O(n) complexity but i recommend you to try this in O(n^2) complexity first which would make you easy to understand.

Cnt array calculation---->

Remember that this problem requires Z array computation in O(n) because you dont need to calculate only one Z array but Z array for every suffix of the original string. The example I took —>

appleap ppleap pleap leap eap ap p

And then for each of these suffixes you need to calculate cnt array. The method to calculate cnt array is that when you get a z array for a suffix then—>

say z array is {1,2,3,4} (arbitrary can be wrong )

then find out total entries greater than equal to 1 say it comes here to be 4 so increment cnt[4] by 1, now check how many values are greater than or equal to 2 here it is 3 so increment cnt[3] by 1. Do this for all values till length n i.e. values greater than or equal to n.

In this way you can find cnt[] array. You need to find this cnt[] array for every suffix of “appleap”. Since it was a calculation part it can be directly implemented. Now this cnt array carries information about total substrings which repeat more than i times at each cnt[i]. So to calculate strings which repeated exactly i times answer is cnt[i]-cnt[i+1]. Then you can use combinatorics to calculate your answer easily. For more help comment.

@apptica

Thank You for your response…:slight_smile: I understand what Z array denotes and I read how to implement it in O(n). However, the part in which I have problem is this: "The method to calculate cnt array is that when you get a z array for a suffix then—>

say z array is {1,2,3,4} (arbitrary can be wrong )

then find out total entries greater than equal to 1 say it comes here to be 4 so increment cnt[4] by 1, now check how many values are greater than or equal to 2 here it is 3 so increment cnt[3] by 1. Do this for all values till length n i.e. values greater than or equal to n.". I can understand the method but what I do not understand is why this works. I can not understand it intuitively. I know that this method works but just cant understand why/how, I mean the reason behind it.

@shiva318 The reason why it works is say i have a string as “appleaa”. Now its Z array will be 7 0 0 0 0 1 1

Total entries >=1 are 3 it means that there are 3 strings of each length >=1 which are also prefix of the original array. This means that whatever the case may be at least their first character would have been same as that of the original string. It means we got a string which repeated 3 times. Now since you are calculating this for every prefix viz. applea pplea plea lea ea a…then you will pick each and every case. Now say we are talking about total length>=2 here it is only 1 so we are incrementing cnt[1] because total strings with length >=2 which also matched the prefix are only one. it means a string repeated just one time. Hope it helps…Re comment for more help…also remember that this is a cumulative calculation

1 Like