TWOSTRS - Editorial

sharepo · May 11, 2020, 9:30am

PROBLEM LINK:

Practice
Div-2 Contest
Div-1 Contest

Author: Viktar Sharepa
Tester: Danya Smelskiy
Editorialist: Viktar Sharepa

DIFFICULTY:

Medium

PREREQUISITES:

Aho–Corasick, trie, prefix-sum.

PROBLEM:

We are given an array of strings S, the beauty of each string t in S is given by a positive integer b_t.

The pleasure of a string C is defined by {\sum\limits_{t \in S}\text{(number of occurrences of }t\text{ in C)}\cdot b_t}.

Given two strings A, B, find the string with maximum pleasure that can be constructed by taking one substring of A and adding one substring of B to its end.

QUICK EXPLANATION:

The string with maximum pleasure can be constructed by taking a non-empty prefix of A and a non-empty suffix of B.
Let’s calculate the answer for all the {|A| \cdot |B|} possibilities. Suppose we choose {A[0..i]} and {B[j..|B|-1]}. There are three types of string occurrences:
1. Occurrences that are completely in {A[0..i]}. We can precompute these sums for each i using the Aho-Corasick algorithm.
2. Occurrences that start in {A[0..i]} and end in {B[j..|B|-1]}. All such occurrences end in {B[j..j+24]} so we can take 25 steps from trie node corresponding to {A[0..i]} in the Aho-Corasick automaton. We can precompute such nodes along with the sums of the first case.
3. Occurrences that are completely in {B[j..|B|-1]}. Note that we have already counted the ones that are completely in {B[j..j+24]}. All that is left to count are occurrences that end in {B[j+25..|B|-1]}. We can do it similarly to the first case

EXPLANATION:

Given a string C = {A[k..i]} + {B[j..l]} we can construct another string D = {A[0..i]} + {B[j..|B|-1]}. Note that any substring of C is also a substring of D, since all the beauties are positive, the pleasure of D is not smaller than the pleasure of C. That means that the string with maximum pleasure always can be constructed by taking a non-empty prefix of A and a non-empty suffix of B.

There are {|A| \cdot |B|} ways of taking a prefix of A and a suffix of B. To quickly calculate the pleasure of each resulting string, we’ll use the following data structures:

Let’s build an automaton on all the strings from S using the Aho–Corasick algorithm. For each node u let sum[u] be the sum of beauties of all strings which nodes are ancestors of u in the suffix link tree.
Let prefASum[i] be the sum of beauties of all occurrences of strings from S in the prefix {A[0..i]} and prefANode[i] be the automaton node corresponding to the prefix {A[0..i]}. The arrays prefASum and prefANode can be calculated iterating over A character by character and moving on the automaton.

C++ code

vector<long long> prefASum(a.size());
vector<int> prefANode(a.size());
for (int i = 0; i < a.size(); i++) {
    ahoCorasick.move(a[i]);
    prefASum[i] = ahoCorasick.getCurrentNodeSum();
    if (i != 0) {
        prefASum[i] += prefASum[i - 1];
    }
    prefANode[i] = ahoCorasick.currentNode;
}

Similarly to the previous step let suffBSum[i] be the sum of beauties of all occurrences of strings from S in B that end in the suffix {B[i..|B|-1]}.

C++ code

vector<long long> suffBSum(b.size());
for (int i = 0; i < b.size(); i++) {
    ahoCorasick.move(b[i]);
    suffBSum[i] = ahoCorasick.getCurrentNodeSum();
}
for (int i = (int) b.size() - 2; i >= 0; i--) {
    suffBSum[i] += suffBSum[i + 1];
}

Consider the string C = {A[0..i]} + {B[j..|B|-1]}. To calculate the pleasure of C, let’s analyze the three types of string occurrences in it:

Occurrences that are completely in {A[0..i]}.
Occurrences that start in {A[0..i]} and end in {B[j..|B|-1]}.
Occurrences that are completely in {B[j..|B|-1]}.

The sum of occurrences of the first type is equal to prefASum[i].
Let’s start at prefANode[i] and move in the automaton character by character of the substring {B[j..j+24]}, keep adding the sum[node] of each node that we visit. In this way we get the sum of beauties of all occurrences that lie in the concatenation of {A[0..i]} and end at {B[j..j+24]}, remember that the strings in S have a maximum lenght of 26. This will allow us to calculate the sum of all the occurrences of the second type and some occurrences of the third type.
All is left to add to current sum are occurrences of the third type that end somewhere in {B[j+25...|B|-1]}. Their sum is equal to suffBSum[j + 25].

Time complexity per test case: build an automaton in {\mathcal{O}(\sum\limits_{t \in S}|t|)} + calculate prefASum, prefANode, suffBSum in {\mathcal{O}(|A|+|B|)} + calculate the pleasure for each prefix and suffix pair in {\mathcal{O}(|A| \cdot |B| \cdot \max\limits_{t \in S}|t|)} = {\mathcal{O}(|A| \cdot |B| \cdot \max\limits_{t \in S}|t| + \sum\limits_{t \in S}|t|)}.

SOLUTIONS:

Setter's Solution

#include <bits/stdc++.h>

using namespace std;

template<int ALPHABET_SIZE, unsigned char MINIMAL_CHAR>
struct AhoCorasick {
    static constexpr int NON_EXISTENT_NODE_ID = -1;
    static constexpr int FAKE_NODE_ID = 0;
    static constexpr int ROOT_ID = 1;

    int currentNode;

    vector<array<int, ALPHABET_SIZE>> edges;
    vector<int> suffixLink;

    vector<long long> sum;

    explicit AhoCorasick(const vector<pair<string, int>> &a) : currentNode(ROOT_ID), edges(2),
                                                               suffixLink(2, FAKE_NODE_ID), sum(2, 0) {
        edges[FAKE_NODE_ID].fill(ROOT_ID);
        edges[ROOT_ID].fill(NON_EXISTENT_NODE_ID);

        for (const auto &p : a) {
            int node = ROOT_ID;
            for (unsigned char c : p.first) {
                c -= MINIMAL_CHAR;
                if (edges[node][c] == -1) {
                    edges[node][c] = edges.size();
                    edges.emplace_back();
                    edges.back().fill(NON_EXISTENT_NODE_ID);
                    suffixLink.push_back(NON_EXISTENT_NODE_ID);
                    sum.push_back(0);
                }
                node = edges[node][c];
            }
            sum[node] += p.second;
        }

        queue<int> q;
        q.push(ROOT_ID);
        while (!q.empty()) {
            int node = q.front();
            if (suffixLink[node] != NON_EXISTENT_NODE_ID) {
                sum[node] += sum[suffixLink[node]];
            }
            q.pop();
            for (int i = 0; i < ALPHABET_SIZE; i++) {
                int child = edges[node][i];
                if (child == NON_EXISTENT_NODE_ID) {
                    edges[node][i] = edges[suffixLink[node]][i];
                } else {
                    suffixLink[child] = edges[suffixLink[node]][i];
                    q.push(child);
                }
            }
        }
    }

    void setNode(int node) {
        currentNode = node;
    }

    void resetNode() {
        setNode(ROOT_ID);
    }

    long long getCurrentNodeSum() {
        return sum[currentNode];
    }

    void move(unsigned char c) {
        c -= MINIMAL_CHAR;
        currentNode = edges[currentNode][c];
    }
};

void solve() {
    string a, b;
    cin >> a >> b;
    int n;
    cin >> n;
    vector<pair<string, int>> s(n);
    for (int i = 0; i < n; i++) {
        cin >> s[i].first >> s[i].second;
    }
    AhoCorasick<26, 'a'> ahoCorasick(s);

    vector<long long> prefASum(a.size());
    vector<int> prefANode(a.size());
    for (int i = 0; i < a.size(); i++) {
        ahoCorasick.move(a[i]);
        prefASum[i] = ahoCorasick.getCurrentNodeSum();
        if (i != 0) {
            prefASum[i] += prefASum[i - 1];
        }
        prefANode[i] = ahoCorasick.currentNode;
    }

    ahoCorasick.resetNode();
    vector<long long> suffBSum(b.size());
    for (int i = 0; i < b.size(); i++) {
        ahoCorasick.move(b[i]);
        suffBSum[i] = ahoCorasick.getCurrentNodeSum();
    }
    for (int i = (int) b.size() - 2; i >= 0; i--) {
        suffBSum[i] += suffBSum[i + 1];
    }

    long long ans = 0;
    for (int i = 0; i < a.size(); i++) {
        for (int j = 0; j < b.size(); j++) {
            long long cur = prefASum[i];

            ahoCorasick.setNode(prefANode[i]);
            for (int k = j; k <= j + 24 && k < b.size(); k++) {
                ahoCorasick.move(b[k]);
                cur += ahoCorasick.getCurrentNodeSum();
            }
            if (j + 25 < b.size()) {
                cur += suffBSum[j + 25];
            }

            ans = max(ans, cur);
        }
    }
    cout << ans << '\n';
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(nullptr);
    cout.tie(nullptr);
#ifdef HOME
    freopen("in.txt", "r", stdin);
#endif
    int tests;
    cin >> tests;
    while (tests--) {
        solve();
    }
}

Tester's Solution

#include <iostream>
#include <string>
#include <cassert>
#include <cstdio>
#include <cstring>
#include <vector>
using namespace std;
 
int Go(int node_id, int next_char);
int get_link(int node_id);
 
const int N = 10'000 + 5;
const int NODES_CNT = N * 26 + 5;
 
string s[N];
int price[N];
 
int nodes = 0;
int nxt[NODES_CNT][26];
int go[NODES_CNT][26];
int link[NODES_CNT];
int node_value[NODES_CNT];
int anc_c[NODES_CNT];
int anc[NODES_CNT];
bool processed[NODES_CNT];
int node_price[NODES_CNT];
 
long long dp[1005];
void clear_node(int id) {
    memset(nxt[id], 0, sizeof(nxt[id]));
    memset(go[id], 255, sizeof(go[id]));
    link[id] = -1;
    node_value[id] = 0;
}
 
void init() {
    nodes = 0;
    clear_node(0);
    anc[0] = -1;
}
 
void add_new_string(const string& w, int cost) {
    int last = 0;
    for (int i = 0; i < (int)w.size(); ++i) {
        int c = w[i] - 'a';
        if (!nxt[last][c]) {
            nxt[last][c] = ++nodes;
            anc_c[nodes] = c;
            anc[nodes] = last;
            clear_node(nodes);
        }
        last = nxt[last][c];
    }
    node_value[last] += cost;
    return;
}
 
int Go(int id, int c) {
    if (go[id][c] != -1)
        return go[id][c];
    if (nxt[id][c])
        return go[id][c] = nxt[id][c];
    if (id == 0)
        return go[id][c] = 0;
    return go[id][c] = Go(get_link(id), c);
}
int get_link(int id) {
    if (link[id] != -1)
        return link[id];
    if (id == 0 || anc[id] == 0)
        return link[id] = 0;
    return link[id] = Go(get_link(anc[id]), anc_c[id]);
}
 
int process_node(int id) {
    if (processed[id])
        return node_price[id];
    processed[id] = true;
    node_price[id] = node_value[id];
    node_price[id] += process_node(get_link(id));
    return node_price[id];
}
 
void build() {
    for (int i = 0; i <= nodes; ++i) {
        processed[i] = false;
        node_price[i] = 0;
    }
    for (int i = 0; i <= nodes; ++i) {
        process_node(i);
    }
}
 
void calc_dp(const string& b) {
    int m = (int)b.size();
    int last = 0;
    for (int i = 1; i <= m; ++i) {
        dp[i] = 0;
        int c = b[i - 1] - 'a';
        last = Go(last, c);
        dp[i] = node_price[last];
    }
    dp[m + 1] = 0;
    for (int i = m; i > 0; --i) {
        dp[i] += dp[i + 1];
    }
    return;
}
 
void solve(int test_id) {
    init();
    string a, b;
    cin >> a >> b;
    int n, m;
    cin >> n;
    m = (int)b.size();
    for (int i = 1; i <= n; ++i) {
        cin >> s[i] >> price[i];
        assert(s[i].size() <= 26);
        add_new_string(s[i], price[i]);
    }
    build();
    calc_dp(b);
    int last = 0;
    long long beauty = 0;
    long long result = 0;
    for (int i = 1; i <= (int)a.size(); ++i) {
        int c = a[i - 1] - 'a';
        last = Go(last, c);
        beauty += node_price[last];
        for (int j = 1; j <= m; ++j) {
            int last_c = last;
            long long beauty_c = beauty;
            for (int k = 0; k < 26; ++k) {
                int r = j + k;
                if (r > m) break;
                int cc = b[r - 1] - 'a';
                last_c = Go(last_c, cc);
                beauty_c += node_price[last_c];
            }
            if (j + 26 <= m)
                beauty_c += dp[j + 26];
            if (beauty_c > result) {
                result = beauty_c;
            }
        }
    }
    cout << result << '\n';
    return;
}
int main(int argc, const char * argv[]) {
    ios_base::sync_with_stdio(false); cin.tie(0); cout.tie(0);
    int tests;
    cin >> tests;
    assert(1 <= tests && tests <= 10);
    for (int i = 0; i < tests; ++i) {
        solve(i);
    }
    return 0;
}

meet2mky · May 11, 2020, 9:32am

I solved it exactly the same way!!!

harisisbatman · May 11, 2020, 10:13am

Can also be solved using just tries without Aho-Corasick. Though the time complexity becomes O(|A|.|B|.|max_length_of_Si|^2) but it can be optimized to pass in the given time constraint. I am guessing a lot of people solved it that way.
I didn’t know about Aho-Corasick. It’s pretty sick. Definitely worth learning.

sharepo · May 11, 2020, 10:24am

@harisisbatman Could you please explain how you optimize this |A| \cdot |B| \cdot 26^2 solution?

silent_killer1 · May 11, 2020, 10:24am

@harisisbatman Hey can you plz tell me your approach for this one because I find the editorial approach a little more complex . I am thinking the way you said but not able to solve it.thankyou.

silent_killer1 · May 11, 2020, 10:28am

@harisisbatman I can tell you what I do.Basically I calculate the answer for A and then B and then combine every possibility of A with B store in vector.Then for every possiblity I just calculate the part which have a substring in array S. If I calculate my time complexity then it should approx (A * B *(26 * 26)) . But it should time out . I didnt use trie because didnt get where I use to optimize that . I am not getting how you are able to optimized it with trie as I am tried so hard but not able to come up with the solution.

leks369 · May 11, 2020, 10:41am

something wrong with the timecomplexity of geeksforgeeks implementation of aho-corasick

harisisbatman · May 11, 2020, 10:45am

For occurrences that are completely in A and completely in B, we can do it with just tires with and extra added factor of 26 in the time complexity but that is no big deal of course. (Let me know if you have confusion on how to do this)
The main part is checking all the substrings and finding the strings starting in A and ending in B.
Naively, each of it can be done in 26*26 time. Lets say we are taking the string A till index i and string B from index j.
First we find all strings in S which start at A[i] and end before B[j+25]
Then we find all strings in S which start at A[i-1] and end before B[j+24] and so on until A[i-24] to B[j]
Now this can be optimized in two ways-

Instead of taking all the different strings in A again and again, we take every different string only once and get its location in trie and then instead of traversing the trie from the top, we traverse it from this location and then for all required strings in B. We do this for strings of length 25, 24, 23 upto 13 in A.
When the length of strings in A become less than 13, it is no longer the most efficient thing to do so now we reverse our strategy and we instead start checking the strings in the reverse order from B to A. So now we take the different strings in B of length 14, 15, . . . 25 and check all required strings of A.

The process is a little overwhelming to code. We need tries of strings in S as well as reversed strings in S and we need to traverse the strings A and B both forwards and backwards.
Heres the code. Its not the neatest code though.
https://www.codechef.com/viewsolution/32986222

The main part of the code is-
B_trie is the trie of normal strings of S.
A_trie is the trie of strings of S in reversed order.

long long bridge_answers[1002][1002];
long long check_all_substrings(){
    long long ans = -1, tmp_ans;
    int i, j, k;
    for(i=0; i<A.length(); i++) for(j=0; j<B.length(); j++) bridge_answers[i][j] = 0;

    for(k=25; k>=13; k--){
        for(i=0+k-1; i<A.length(); i++){

            struct Node *node = B_trie.get_node(A, i-k+1, k, 1);

            for(j=0; j<(int)B.length()-k+1; j++){
                bridge_answers[i][j] += B_trie.get_value(node, B, j, 26-k, 1);
            }
        }
    }

    for(k=1; k<=25; k++){

        for(i=0; i<=(int)B.length()-k; i++){
            struct Node *node = A_trie.get_node(B, i+k-1, k, -1);

            for(j=0; j<A.length(); j++){
                bridge_answers[j][i] += A_trie.get_value(node, A, j, min(26-k, 12), -1);
            }
        }
    }

    for(i=0; i<A.length(); i++){
        for(j=0; j<B.length(); j++){
            ans = max(ans, A_pleasure[i] + B_pleasure[j] + bridge_answers[i][j]);
        }
    }

    return ans;
}

With this, the time-complexity becomes O(|A|.|B|.13.13)

pastor · May 11, 2020, 10:48am

what is the time complexity of the solution??

sharepo · May 11, 2020, 11:01am

Seems like it works right under the TL in the worst case (smth about 0.8s) but passes.

virik_99 · May 11, 2020, 11:04am

The testcases for the first two subtasks seem to be weak. My wrong logic https://www.codechef.com/viewsolution/32961034 fetched me 50 points passing subtask 1 and 2 and only failed on subtask 3, when the only difference between the subtasks is with respect to the time complexity.

harisisbatman · May 11, 2020, 11:06am

Yes… I had to make all other optimizations as well. Like I couldn’t afford to copy strings and then reverse them to serve to the trie which would have made the code a little easier. Had to do everything by reference to the original A and B strings.

harisisbatman · May 11, 2020, 11:09am

I bet the solution in the editorial looks easier after you read mine

chanduds · May 11, 2020, 11:13am

How much time will it take to calculate ratings anyone please?

sharepo · May 11, 2020, 11:13am

It is much easier if you know and understand Aho-Corasick. Thank you for sharing your idea and optimizations.

chanduds · May 11, 2020, 11:17am

How much time will it take to calculate ratings please sharepo?

sharepo · May 11, 2020, 11:17am

It is difficult to create a test for each existing incorrect solution. By the way, the only test you got WA on was a random one. You were lucky and unlucky at the same time.

sharepo · May 11, 2020, 11:18am

I do not know. Ask admins.

harisisbatman · May 11, 2020, 11:18am

My pleasure!
Indeed its easier with Aho-Corasick.

virik_99 · May 11, 2020, 11:24am

I understand, just thought I would point it out. And coincidentally, the way I realized my logic was incorrect was checking my output on a random test case against my brute force solution.