CODON - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3

Author: Le Duc Minh, Nguyen Anh Quân
Testers: Shubham Anand Jain, Aryan
Editorialist: Nishank Suresh

DIFFICULTY:

Easy-Medium

PREREQUISITES:

Graphs, Dynamic Programming

PROBLEM:

You are given an undirected multigraph without self-loops. Each node has a lowercase alphabet written on it.
You also have a string S of lowercase letters, of length L.
A beautiful path is a sequence of L-1 edges, such that there is a sequence of L nodes satisfying the following:

  • For each valid i, the i-th edge connects the i-th and (i+1)-th of these nodes
  • for each valid i, the i-th character of S is written in the i-th of these nodes
    A path may visit any given edge or vertex arbitrarily many times.
    Determine the number of beautiful paths in the graph, modulo 10^9 + 7.

EXPLANATION:

Let the character written on node i be c_i.

At first glance, it seems not very easy to combinatorially count the number of such paths.
So, as is common in counting problems where direct combinatorics is hard, we use dynamic programming.
Any dp is determined by its states and transitions. What could those be in this case?

States

Let dp[i][j] denote the number of beautiful paths of j edges matching the first j+1 characters of S, such that the path ends at vertex i.

Transitions

The base case here is dp[i][0] = 1 if S[0] = c_i, and dp[i][0] = 0 otherwise. (do you see why?)
Now, let us look at j > 0.
If c_i \neq S[j], dp[i][j] = 0 - if the last characters don’t match, no path is going to be good.
So, let c_i = S[j].
Consider any beautiful path of j edges ending at vertex i, and let the last edge traversed in this path be e = (u,i).
Clearly, the first j-1 edges of this path are then a beautiful path of j-1 edges, ending at u - and conversely, any such path (of which there are dp[u][j-1]) can be extended to a beautiful path of j edges by adding e to it.
In other words, dp[i][j] = \sum dp[v][j-1] over all edges (i, v) incident to i.
Note that if there multiple edges between i and v, we add dp[v][j-1] for each of these edges - because those are distinct paths by definition.

So, we have our solution:

  • Initialize dp[i][0] = 1 if S[0] = c_i, and dp[i][0] = 0 otherwise.
  • For j > 0, dp[i][j] = \sum dp[v][j-1] over all edges (i, v) incident to i.
    This can be implemented iteratively by first iterating over the length from 1 to L-1, then iterating over vertices from 1 to N, and finally over edges incident to each vertex.

The final answer is \sum_{i=1}^n dp[i][l-1] - the number of beautiful paths of L-1 edges ending anywhere.

What about the time complexity of this solution?
The dp table has N*L states, and for each state (i, j) we have to visit all the neighbours of i, of which there can be \mathcal{O}(N). So, is it
\mathcal{O}(LN^2))?
No!
Looking at the transition carefully, we see that, for a given length j, we visit every edge (u, v) exactly twice - once from u, and once from v. So,
rather than N^2, the number of operations for a given length is bounded by O(N+M) instead - thus giving a time complexity of \mathcal{O}(L(N+M)).

However, there is one final issue left to deal with.
Suppose S is just a single character (say, a) repeated L times, and u and v are two nodes connected by an edge e, such that c_u = c_v = a.
Then, one possible path is to just repeat e L-1 times. This creates a problem, however - this path is being counted twice - once when ending at u, and once when ending at v (i.e, the vertex labels of the path can be u - v - u - ... or v - u - v - ...). We don’t want that to happen.

How to fix this?

First, notice that this case can only possibly show up when S is a single character repeated L times - if it has at least two distinct characters, this problem will not arise.
Second, any path which visits at least 3 distinct vertices has a distinct vertex sequence labelling it (this also proves the first point).
So, we only need to consider the case when S is a single character repeated L times, and the path is a sequence of edges between the same two vertices - each such path will be counted exactly twice, so we need to remove this overcounting.
How many such paths are possible? Between vertices u and v such that c_u = c_v = S[0], if there are k edges, there are k^{L-1} possible paths of this form - for each edge of the path, choose any one of the k edges. Each of these is counted twice, so we simply subtract k^{L-1} from the answer.
Make sure that you don’t oversubtract by considering both (u, v) and (v, u) - for example, only process pairs where u < v.

This takes \mathcal{O}(N^2) time.

TIME COMPLEXITY

\mathcal{O}(L(N+M) + N^2) per testcase.

SOLUTIONS:

Setter's Solution
#include<bits/stdc++.h>
using namespace std;
 
const int MOD = 1e9 + 7;
 
const int maxN = 1e3 + 5;
const int maxM = 1e3 + 5;
const int maxL = 1e3 + 5;
 
int T;
int n, m, l;
string s;
string c;
int tmp[maxM];
vector<int> adj[maxN];
int cntEdges[maxN][maxN];
int dp[maxL][maxN];
int p[maxM];
 
int power(int x, int y) {
    if (y == 0) return 1;
    int k = power(x, y / 2);
    return 1ll * k * k % MOD * (y & 1 ? x : 1) % MOD;
}
 
void reset() {
    for (int u = 1; u <= n; u++) {
        adj[u].clear();
        for (int v = 1; v <= n; v++) {
            cntEdges[u][v] = 0;
        }
    }
 
    for (int i = 1; i <= l; i++) {
        for (int u = 1; u <= n; u++) {
            dp[i][u] = 0;
        }
    }
}
 
int main() {
    ios::sync_with_stdio(0);
    cin.tie(0); cout.tie(0);
 
    cin >> T;
 
    while (T--) {
        cin >> n >> m >> l;
        cin >> s;
        cin >> c;
 
        for (int i = 1; i <= m; i++) cin >> tmp[i];
        for (int i = 1; i <= m; i++) {
            int u = tmp[i], v;
            cin >> v;
            adj[u].push_back(v);
            adj[v].push_back(u);
            if (u > v) swap(u, v);
            cntEdges[u][v]++;
        }
        s = ' ' + s;
        c = ' ' + c;
 
        for (int u = 1; u <= n; u++) {
            if (s[1] == c[u]) {
                dp[1][u] = 1;
            }
        }
 
        for (int i = 2; i <= l; i++) {
            for (int u = 1; u <= n; u++) {
                if (s[i] == c[u]) {
                    for (int v : adj[u]) {
                        dp[i][u] = (dp[i][u] + dp[i - 1][v]) % MOD;
                    }
                }
            }
        }
 
        int ans = 0;
        for (int u = 1; u <= n; u++) {
            ans = (ans + dp[l][u]) % MOD;
        }
 
        int same = 1;
        for (int i = 2; i < s.size(); i++) {
            if (s[i] != s[1]) {
                same = 0;
                break;
            }
        }
        if (same) {
            for (int i = 0; i <= m; i++) {
                p[i] = power(i, l - 1);
            }
            for (int u = 1; u <= n; u++) {
                if (c[u] != s[1]) continue;
                for (int v = u + 1; v <= n; v++) {
                    if (c[v] != s[1]) continue;
                    ans -= p[cntEdges[u][v]];
                    if (ans < 0) ans += MOD;
                }
            }
        }
 
        cout << ans << '\n';
 
        reset();
    }
}
Tester's Solution
//By TheOneYouWant
#pragma GCC optimize ("-O2")
#include <bits/stdc++.h>
using namespace std;
#define fastio ios_base::sync_with_stdio(0);cin.tie(0)
 
const long long int MOD = 1e9+7;
 
long long int fastpow(long long int a, long long int p){
    if(a == 0) return 0;
    if(p == 0) return 1;
    long long int z = fastpow(a, p/2);
    z = (z * z) % MOD;
    if(p % 2) z = (a * z) % MOD;
    return z;
}
 
int main(){
    fastio;
 
    int tests;
    cin>>tests;
 
    while(tests--){
        int n, m, l;
        cin>>n>>m>>l;
 
        string s1, s2;
        cin>>s1>>s2;
 
        vector<int> adj[n]; // adjacency list
        int val = s1[0];
        bool same = 1;
        long long int pre[m+1] = {0};
        for(int i = 0; i <= m; i++){
            pre[i] = fastpow(i, l-1); // precompute i^(l-1)
        }
 
        // check if all values of s1 are same
        for(int i = 0; i < l; i++){
            if(s1[i] != val) same = 0;
        } 
 
        map<pair<int,int>,int> num; // to store the edge weights
        int u[m], v[m];
        for(int i = 0; i < m; i++){
            cin>>u[i];
        }
        for(int i = 0; i < m; i++){
            cin>>v[i];
        }
        for(int i = 0; i < m; i++){
            u[i]--; v[i]--;
            if(u[i]>v[i]) swap(u[i],v[i]);
            num[make_pair(u[i],v[i])]++;
            adj[u[i]].push_back(v[i]);
            adj[v[i]].push_back(u[i]);
        }
 
        long long int dp[n][2];
        for(int i = 0; i < n; i++) 
            for(int j = 0; j < 2; j++) 
                dp[i][j] = 0;
 
        for(int i = 0; i < n; i++){
            if(s2[i] == s1[0]) dp[i][0] = 1; // initial value
        }
 
        for(int i = 1; i < l; i++){
            int curr = i % 2;
            int prev = 1 - curr;
            for(int j = 0; j < n; j++) dp[j][curr] = 0;
            for(int j = 0; j < n; j++){
                for(auto & r : adj[j]){
                    if(s2[r] == s1[i]){
                        dp[r][curr] = (dp[r][curr] + dp[j][prev]) % MOD;    
                    } 
                }
            }
        }
 
        long long int ans = 0;
        int laa = (l-1) % 2;
        for(int i = 0; i < n; i++) ans += dp[i][laa];
        ans %= MOD;
        for(int i = 0; i < n; i++){
            for(int j = i+1; j < n; j++){
                if((same == 1) && (s2[i] == s2[j]) && (s2[i] == val)){
                    ans -= pre[num[make_pair(i,j)]];
                    ans %= MOD;
                    if(ans < 0) ans += MOD;
                }
            }
        }
 
        cout<<ans<<endl;
    }
 
    return 0;
}
Editorialist's Solution
#include "bits/stdc++.h"
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("sse,sse2,sse3,ssse3,sse4,popcnt,mmx,avx,avx2")
using namespace std;
using ll = long long;
 
const int MOD = 1e9+7;
ll mpow(ll a, ll n)
{
    if (a == 0) return 0;
    ll r = 1;
    while (n) {
        if (n&1) r = (r*a)%MOD;
        a = (a*a)%MOD;
        n >>= 1;
    }
    return r;
}
 
int main()
{
    ios::sync_with_stdio(0); cin.tie(0);
    mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());
 
 
    int q; cin >> q;
    while (q--) {
        int n, m, l; cin >> n >> m >> l;
        string s; cin >> s;
        string chr; cin >> chr;
        vector adj(n, vector<int>());
        vector ct(n, vector(n, 0));
        vector<int> u(m), v(m);
        for (int i = 0; i < m; ++i) {
            cin >> u[i]; --u[i];
        }
        for (int i = 0; i < m; ++i) {
            cin >> v[i]; --v[i];
            adj[u[i]].push_back(v[i]);
            adj[v[i]].push_back(u[i]);
            if (u[i] > v[i]) swap(u[i], v[i]);
            ++ct[u[i]][v[i]];
        }
 
        vector dp(n, vector(l, 0LL));
 
        // dp[i][j] -> number of good walks ending at i, using j edges
        for (int i = 0; i < n; ++i)
            dp[i][0] = chr[i] == s[0];
 
        for (int len = 1; len < l; ++len) {
            for (int i = 0; i < n; ++i) {
                if (s[len] != chr[i]) continue;
                for (int x : adj[i]) {
                    dp[i][len] = (dp[i][len] + dp[x][len-1])%MOD;
                }
            }
        }
        ll ans = 0;
        for (int i = 0; i < n; ++i)
            ans = (ans + dp[i][l-1])%MOD;
 
        if (*min_element(begin(s), end(s)) == *max_element(begin(s), end(s))) {
            for (int i = 0; i < n; ++i) {
                for (int j = i+1; j < n; ++j) {
                    if (chr[i] == s[0] and chr[j] == s[0])
                        ans = (ans + MOD - mpow(ct[i][j], l-1))%MOD;
                }
            }
        }
        cout << ans << '\n';
    }
}
5 Likes

Why is the path u->v->u->v->u… considered to be the same as v->u->v->u->v… ?
Aren’t they different as u!=v ? Why do we need to subtract them?
If a string is palindrome are we counting paths twice the actual number?

1 Like

This is a matter of definition - note that a beautiful path is defined as a sequence of L-1 edges for which there exists a sequence of L vertices such that certain conditions are satisfied. It is not a sequence of vertices following those conditions.

The second sample makes this clear - there two valid sequences of vertices (1, 2) and (2, 1), but only one valid edge sequence - the single edge taken once. The output is thus 1, not 2.
Hopefully that clears it up.

3 Likes

@iceknight1093 can you please check my submission once. It is only giving WA when the string is a palindrome. How will the overall answer get affected if the string is a palindrome and we solve ignoring this fact? I divided the answer by 2, but it didn’t work. CodeChef: Practical coding for everyone

Please read the last part of the editorial, where I explain when and why this happens, and how to fix it. Just dividing by 2 will not work - consider the case

1
3 5 4
aaaa
aaa
1 1 1 1 2
2 2 2 2 3

whose answer is 105 and not 85.

1 Like

Okay so the problem is with string having similar characters and not palindrome. The editorial explains it well. Thank You.

1
3 2 3
ede
ede
1 2
2 3

In this test case, why do we have to count paths (1 , 2 , 3) and (3 , 2 ,1) different.

Because they are different. As I explained above, note that paths here are defined purely in terms of the sequence of edges.
In your example, there are 2 edges: e_1 between (1, 2) and e_2 between (2, 3).
The two sequences e_1 - e_2 and e_2 - e_1 are different because the order of edges in them is different.

1 Like

can someone explain me what’s wrong in this my testcase 2 is running but 1 isn’t
https://www.codechef.com/viewsolution/44227416

I think dp[i][j] = summation (dp[v][j-1]*cntedge[i][v]) as there are cntedge[i][v] ways to get to node v from i. Here is my code CodeChef: Practical coding for everyone

LOL I didn’t read the input format correctly. My code works fine.

ohh now I got it, in single character string the the edges will be same between the two nodes. And I was trying to get rid of the second path in all the palindromes (which was pain in a**).

Hi, can anybody tell me how to optimize my solution: CodeChef: Practical coding for everyone
It gives TLE for 1st TC and AC for 2nd TC.

I can’t understand the first 2 statements of “How to fix this” part. Can anyone please elaborate it.

1 Like

Think about when a beautiful path gets counted twice.
By definition (in this problem), a beautiful path is a sequence of edges such that some conditions are fulfilled.
However, our dp counts paths in terms of vertices - we consider which vertex a path can end at. This difference is what creates a problem - if there is a sequence of edges which can be represented by two different vertex sequences, we count that path twice - when in reality it should only be counted once.
The simplest example of when this arises is the second sample input - there is only one edge, (1, 2), but it is counted twice as the vertex sequences 1\rightarrow 2 and 2\rightarrow 1.

When can a path have two (or more) labellings?
Say we have an edge sequence (e_1, e_2, e_3, ...) which has two different vertex sequences, say a\rightarrow b\rightarrow c\rightarrow d\rightarrow ... and w\rightarrow x\rightarrow y\rightarrow z\rightarrow ...
Try to convince yourself that the only time this can happen is if the sequences are a\rightarrow b\rightarrow a\rightarrow b\rightarrow ... and b\rightarrow a\rightarrow b\rightarrow a\rightarrow ...

Proof

The endpoints of e_1 are a and b. So, either (w,x) = (a,b), or (w, x) = (b,a).

  • If (w,x) = (a,b), y must be the other endpoint of e_2, which is c. Then, z must be the other endpoint of e_3, which is d. If you continue this, you see that the two vertex sequences are actually equal. So this case can’t happen.
  • If (w, x) = (b, a), that would mean a is an endpoint of e_2…but so is b. So the only option is y = b. And again, this makes the endpoints of e_3 to be a and b, so z = a. Also note that this means we have c = a, d = b, and so on - so both sequences are an alternating sequence of vertices.

This also proves that if you count a path multiple times, you will count it exactly twice - once ending at a and once at b.
And of course, when your sequence alternates between the same two vertices, and you can start with either one, they must have the same character written on them - and this would mean that S is just the same character repeated multiple times.

1 Like

Won’t this solution count regular palindromes twice as well?
for example if the nodes are “abba” and the word is “abba” and each node is only connected to the next node, then wont two paths be counted?
Here is the testcase I am talking about
1
4 3 4
abba
abba
1 2 3
2 3 4

This counts the path 1->2->3->4 and 4->3->2->1 as distinct paths right? Shouldn’t it therefore give a WA?

the problem talks about sequence of edges, so it is all right to count them twice.

But then doesnt that also hold true for the the special case given in the editorial where the string only contains one character repeated L times? The repeat cases should then be considered distinct because of the sequencing. Can someone help me out? Sorry i’m a bit confused :sweat_smile: