TWOSTRS - Editorial

silent_killer1 · May 11, 2020, 11:32am

I am not able to get the part where you said “When the length of strings in A become less than 13, it is no longer the most efficient thing to do so” .

munch_01 · May 11, 2020, 12:04pm

i tried a different approach apart from the tree(i am bad at trees and graphs as of now, so went with a linear intuition).
Firstly i had a hash function,then since the maximum length of pattern strings is 26 i found out all the substrings of A and B of lengths ranging from 1 to 26 using brute force which would be around O(2610^3),hashed their values and stored them in a vector.
Later i sorted those vectors and copied those distinct hashed values into a new vector(notice each hash value is in sorted order and having a distinct index).
Then from now onwards get the pattern ,divide it into all possible prefix and suffix(both non empty) and find the hash value of the prefix and find the index using binary search,similarly do the same for suffix part also.
Now iterate over the indices of prefix and respective suffix indices and add the beauty value z in a table of size 10001000 where ith row indicates prefix ending point and jth column indicates suffix starting point.
finally check whether the pattern can be as a whole substring of A or B and calculate those beauty values separately in a linear array using dp
finally iterate over the table summing up the complete prefix pattern and suffix pattern and taking the maximum of them.

munch_01 · May 11, 2020, 12:05pm

Dont use maps and hashes as they cause tle…instead go for binary search index mapping,i know i complicated it just hope it will be useful to someone

harisisbatman · May 11, 2020, 12:12pm

When the length becomes less than 13, i will have to do more than 13 operations to check the string in B.
Let’s see it in this way, when I have a string of length 20 in A, I only have to see a string of length 6 in B. When I have a string of length 15 in A, I have to see a string of length 11 in B. Likewise, when I have a string of length 13 in A, corresponding to that, I have to see a string of length 13 in B. But when I have a string of length 12 in A, I have to see a string of length 14 in B. Now it makes more sense to store those 14, 15, 16, . . and 25 length strings in B and see strings of length 12, 11, 10, . . . 1 in A.
I know I am not explaining it well enough. I don’t know how I can explain it in a better way.

serotonin · May 11, 2020, 12:46pm

I’m also not familiar with aho-corasick. Knew the algo, don’t know how to apply it.
Used kmp + trie. But it was exceeding TL marginally in the final task since the trie traverse could go upto O(|A|.|B|. 25^2 / 2) in the worst case. That would be a set of strings with same character everywhere.
So I just added an if-clause in the |A| loop to not traverse the trie when the preceding 26 characters are exactly the same. And just use the result from the previous state with the current iteration of A. TL: 0.39s

alei · May 11, 2020, 1:42pm

Hardifying problems: Initially the problem had lower constraints and was intended to have an easy difficulty. Danya found the aho-corasick solution, so we added one more subtask.

The editorial written by Viktar for the initial version is the following:

Since all beauty_i are positive, we can always choose a prefix of A and a suffix of B.
If we choose some prefix A[0…x] and some suffix B[y…size(B)-1], the sum we want to calculate can be divided into 3 sums:

Strings that are completely in the prefix of A
Strings that are completely in the suffix of B
Strings that start in the prefix of A and end in the suffix of B
1 and 2 we can precompute:
For each position in A calculate the sum of strings which end at this position (i.e. string that are suffixes of current prefix of A), then build prefix sum on it.
It can be done with a trie in O(size(A) * max(|S_I|)) + O(sum len of all S_i) to add them into the trie.
The same method we can use for string B.
To find the 3rd sum for pair(x, y) we can iterate over all strings that start in A[0…x], end in B[y…size(B)-1] with length <= 26.
It can be done with a trie in O(size(A) * size(B) * max(|S_I|) * max(|S_I|)).

ss_loves_ak · May 11, 2020, 1:52pm

I solved it using suffix array and 2D fenwick tree.
Let dp1[i] is the sum of values of all strings present in A[0…i] and dp2[j] is the sum of
values of all strings present in dp2[j…|B|-1). we can easily compute dp1 and dp2 either with suffix array or aho-corasick algorithm.
Now, dp[i][j] is the sum of values of string which are present in between x= A[0…i] and y=B[j…|B|-1] means which are form by combine x+y .
Let build suffix array of reverse(A) and of B .
For each s in S divide it in two strings x1=s[0…k] and x2=s[k+1…|s|-1]
where 0<= k < |s|-1 .
And then,

search the reverse of (s[0…k] ) in suffix array of rev(A) let L1 is lower-bound and R1 is the upper bound of search of this string
search the s[k+1…|s|-1] in suffix array of b and let L2 and R2 is lower and upper bound for this string.
Now add the weight of the string s in
dp[L1…R1][L2…R2] using 2D Fenwick Tree.
Now iterate over each I and j and calculate the maximum value.
My implementation https://www.codechef.com/viewsolution/32970900

aman2gupta95 · May 11, 2020, 3:59pm

hey I am new to trie. Can you tell me the first part. Like how you calculated occurances in A[0…i] and in B[j…|B|-1] ?

harisisbatman · May 11, 2020, 5:36pm

I’ll explain for B first.
Put all the strings of S in a trie
The insert function may look like this-

void insert(string& s, int val){
        int i;
        node = root_node;

        for(i=0; i<s.length(); i++){
            if(!(node->children[(int)s[i]])){
                node->children[(int)s[i]] = get_new_node();
            }
            node = node->children[(int)s[i]];
        }
        node->ending_here += val;
    }

And the calls would look like this-

for(i=0; i<N; i++){
    B_trie.insert(S[i], beauty[i]);
}

Now first, we want to find

for i=B.length()-1 to i=0
how many strings in S are starting at B[i] (Let this be stored in B_pleasure[i])`

To do that, just take the substring B[i:i+25] and run on the trie.
Second, we want to get all the substrings that are present on the right of B[i] rather than just the substrings that are starting from B[i]. So we will iterate through B_pleasure and we’ll do
B_pleasure[i] += B_pleasure[i+1];

Here’s the code related to that
This function works on trie

long long get_value(string s){
        int i;
        long long value=0;
    node = root_node;

    for(i=0; i<s.length(); i++){

        if(!(node->children[(int)s[i]]))
            break;
        node = node->children[(int)s[i]];
        value += node->ending_here;
    }

    return value;
}

And this function fills the B_pleasure

    s = B.substr(B.length()-1, 1);
    B_pleasure[B.length()-1] = B_trie.get_value(s);

    for(i=B.length()-2; i>=0; i--){
        s = B.substr(i, min((int)B.length()-i, 26));
        B_pleasure[i] = B_pleasure[i+1] + B_trie.get_value(s);
    }

For string A, you can do a very similar thing but you may need a trie of reversed strings in S.

gauravsingh1 · May 11, 2020, 9:34pm

Hey, I have coded the Aho Corasick based solution here,
but somehow getting tle on three files of the last subtask.
Please help!! @sharepo
PS: suf[i] of trie node (say, v) stores the sum of scores of all the suffixes of v that have their length >= i. Since, I have done 0-based indexing, so lengths are scaled down by 1.

better_sleep · May 12, 2020, 5:49am

Can anyone answer some stupid queries I have?

After reading Aho-Corasick algorithm, I find it similar to using kmp algorithm. Like suffix links is similar to what we store in kmp. So, can this question be solved by using kmp, using a similar way of storing pref and suffix sum for each index?

I tried solving it using kmp but my implementation was O(|A|*|B|*26^2) , so it failed to pass the 2nd and 3rd subtask, as I was rebuilding strings for each index.

Like is there a reason why this algorithm is an obvious choice for this question?

better_sleep · May 12, 2020, 6:31am

Ohhk I understood why aho is better for choice here.

Like here we have somewhat similar question to this:
Find all occurrences in T of any pattern in
the set of patterns P = {P1, . . . , Pz}.

We can do it naive way by using kmp Z time.
So its computation time becomes O(|T|*Z).
whereas, aho can do it in linear time.

Thanks for this question and editorial

sharepo · May 12, 2020, 6:54am

Yes, Aho-Corasick is like a kmp on a trie.

tatami · May 12, 2020, 6:55am

This problems has a tight time limit, so that similar algos that differ just a little bit can get TLE.

aksh_idi1 · May 12, 2020, 10:44am

Can someone help me out with my implimentation of following problem. It shows wrong answer but I could’nt find any.

https://www.codechef.com/viewsolution/33002794

sharepo · May 12, 2020, 4:03pm

Your idea looks good. So you need to optimize your code.

Don’t use long longs if it is not necessary.
Don’t do a lot of push_back or emplace_back. Use arrays instead.
You can save automaton nodes corresponding to prefixes of A, so you don’t need this for:

for

for(i = max(pa - K + 1, 0ll); i <= pa; i++) {
  v = t[v].go[a[i] - 'a'];
}

Here is your optimized solution.

sharepo · May 12, 2020, 4:07pm

Try this test:

Test

1
abc
def
3
d 100
bcef 51
ce 50

The correct answer is 101.

amrit5 · May 12, 2020, 4:23pm

yup I have applied gfg approch and got tle

gauravsingh1 · May 12, 2020, 4:52pm

Whoa! how did you do that! I could never think of those nasty little optimizations!
So, let me guess, compiler works faster with smaller data types because it has to allocate smaller chunks of bytes which it finds easily in the memory, right?
Likewise, if we use arrays, compiler doesn’t have to reallocate the chunks again and again searching all over in the memory, right?
Thank you!

ankitstudy88 · May 12, 2020, 5:37pm

The Same problem , I was also getting.This can be optimized with aho-corasick algorithm.