IDXSUM - EDITORIAL

teja349 · June 23, 2019, 1:39pm

PROBLEM LINK:

Practice

Contest: Division 1

Contest: Division 2

Setter: Mohammad Solaiman

Tester: Teja Vardhan Reddy

Editorialist: Teja Vardhan Reddy

DIFFICULTY:

PREREQUISITES:

knuth morris pratt algorithm (kmp), dynamic programming.

PROBLEM:

You will be given a string P of length M consisting of lowercase characters.

Consider all possible strings of length N consisting of lowercase characters. There are 26^N possible such strings. Let’s enumerate them lexicographically starting from 0. For i^{th} string where 0 \leq i \leq 26^N-1, let C_i be the length of the longest prefix of P which is also a suffix of this string.

The value of C_i can be 0 to M. For each possible value k (0 \leq k \leq M), find the sum of the indices i of the strings for which C_i will be equal to this value (let us represent this value by ans[k]).

EXPLANATION

Let us assume we have lps (longest prefix suffix) array of string P constructed (lps array is a common term from kmp algorithm).

Let us represent i length prefix of P with pref_i

Let us assume each prefix(including empty prefix) of a string P is represented by a node.Let us represent prefix of length i with vertex (or node) with label i. Now, we will have n+1 nodes.

Note that lps[0](lps of empty string) is not defined.

Let us construct a tree such that we make parent of vertex i, its lps[i]. vertex 0 will be root of the tree since its lps is not defined for it and it does not have parent while rest all have unique parent.Also,you can easily convince yourself that it will indeed be tree because parent label for a vertex will always be smaller than vertex label. Hence, cycles cannot be formed.

let us define lps_k[i] = lps_{k-1}[lps[i]] if (k \geq 1)
lps_0[i] = i.

lps_k[i] represents k th parent of node i in the tree formed above.

Claim 1: If a string s is a suffix of string p and p is suffix of string t, then s is also a suffix of t. (length(s)<=length(p)<= length(t))

Proof:
Last length(p) characters of p and t are same.
Hence,Last length(s) characters of s and p are same
Hence, last length(s) characters of s and t are same (since length(s)<=length(p)).
Hence, s is also a suffix of t.

Corollary 1: We can see that vertex lps_x[j] for valid x>=0 is always a suffix of vertex j because vertex lps[j] is suffix of vertex j.And vertex lps[lps[j]] is suffix of vertex lps[j]. And applying above claim here, we get vertex lps_2[j] is a suffix of vertex j. Similar we can extend it further.

Claim 2: If a string s is a suffix of string t and p is suffix of string t and (length(s)<=length(p)<= length(t)), then s is suffix of string p also.
Proof:
Last length(p) characters of p and t are same.
Hence,Last length(s) characters of p and t are same (since length(s)<=length(p)).
Last length(s) characters of s and t are same.
Hence, last length(s) characters of s and p are also same. Hence s is suffix of p also.

Claim 3:pref_i is a proper suffix of pref_j (i \lt j) if and only if for some x,lps_x[j]=i.

Proof for forward direction:
Proof by contradiction.
Assume for no value of x, lps_x[j]=i.
We know that for some value of x, lps_x[j] = 0. since pref_0 is root of the tree.
Now, first we will note that lps_x[j] will strictly decrease with x.
Since lps_0[j]=j (\gt i). There exist a value y such that lps_y[j]\lt i \lt lps_{y-1}[j].
Now since pref_{lps_{y-1}[j]} is suffix of pref_j and pref_i is a suffix of pref_j. From claim 2 , i is a suffix of lps_{y-1}[j] and since the lengths are not same, pref_i will be proper suffix of vertex pref_{lps_{y-1}[j]}. Now, since we know that pref_i is a prefix of pref_{lps_{y-1}[j]}. Now i \leq lps_{lps_{y-1}[j]} (= lps_y[j]) because lps_y[j] is the longest prefix suffix of lps_{y-1}[j] which is a contradiction.

Proof for backward direction is exactly corollary 1.

So now, pref_i is suffix of all of the vertices in its subtree and not any other vertices.

Index of length n string can be obtained as \sum_{i=0}^{n-1} (p[i]-a)*26^{n-1-i}. We can see it as contribution for each position in the string. We can see that contribution from ith character is (p[i]-a)*26^{n-1-i}.

Firstly we will solve a easier problem.

Sum of indices of strings of length n such that pref_i is a suffix of it.
To solve this. we can see that all the strings with last i letters exactly same as pref_i will satisfy and these are the only strings that satisfy. So now, number of such strings will be 26^{n-i} because rest (n-i) positions can be anything. Also, suffix of all of them will be same. So contribution from last i letters will be (\sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j}) in one string since they are equal to pref_i. Across all the 26^{n-i} strings will be equal to (\sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j})*(26^{n-i}).

And now, lets try to find the contribution from first n-i letters across all the strings,Let us fix the first character, then rest have 26^{n-i-1} choices. So total contribution from first character will be 26^{n-i-1}*(\sum_{ch=a}^{z}(ch-a)*26^{n-1}).

Now similarly contribution from j ( \leq n-i) th character will be 26^{n-i-1}*(\sum_{ch=a}^{z}(ch-a)*26^{n-j}) = 26^{2*n-j-i}*25/2.

Now to get contribution from first n-i characters, we need \sum_{j=1}^{n-i}26^{2*n-j-i}*25/2 = 26^n*(26^{n-i}-1)/2. (there is an easier double counting idea also which I feel is hard to explain here. so lets stick to this proof.)

So now sum of indices of strings with pref_i are suffix will be 26^n*(26^{n-i}-1)/2 +(\sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j})*(26^{n-i}). We can precompute \sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j} for all i using dynamic programming in a similar way to how we compute hashes of a string. So once we do the precomputation and also precompute first n powers of 26, we can get 26^n*(26^{n-i}-1)/2 +(\sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j})*(26^{n-i}) (lets call this foo[i]) in O(1).

Now, required problem differs to the problem we solved only that pref_i is longest prefix which is suffix of P in required.

We can do a easy modification to achieve that. So now, we have also taken strings where some pref_j is longest suffix and pref_i is also suffix (j \gt i). In such cases, pref_i will be suffix of pref_j. So now if we can subtract from foo[i] the sum of indices such that pref_j is longest prefix and pref_i is suffix of pref_j across all j then we will get the required answer.

We have already proved that for vertices in subtree of vertex i, pref_i will be their suffix. Now , since we want (j>i), we only consider proper subtree of i (i.e subtree of i other than i).

So now, we need to iterate over the proper subtree of vertex i (other than i everything else in the subtree). And compute the sum of ans[j] for all vertex j which are in proper subtree of i and subtract that from foo[i] to get ans[i]. This is a standard tree dynamic programming problem.

We can represent dp[i] as sum of ans values of all vertices in subtree of j. Now we can do dfs over the tree and compute ans and dp values for each vertex using their children’s ans and dp values. Hence, this will take O(m) time.

TIME COMPLEXITY

Finding lps array takes O(m) time.

Complexity of dfs is O(m).

Hence, total complexity is O(m)

SOLUTIONS:

Setter's Solution

#include<bits/stdc++.h>
using namespace std;
typedef long long LL;
const LL MOD = 998244353;
//const LL MAXM = 100007;
const LL MAXN = 1000007;
const LL ALPHA = 26;
 
LL sum(LL a, LL b)
{
    return a+b>=MOD?a+b-MOD:a+b;
}
LL sub(LL a, LL b)
{
    return a-b<0?a-b+MOD:a-b;
}
LL mul(LL a, LL b)
{
    return (a*b)%MOD;
}
 
LL pw(LL a, LL b)
{
    if (b==0) return 1;
    LL r = pw(a, b/2);
    r = mul(r, r);
    if (b%2) r = mul(r, a);
    return r;
}
 
vector< int >prefixFunction(const string& s)
{
    /// everything 1-indexed
    /// v[i] = 0 -> empty string matched
    /// v[i] = k -> prefix s[0..(k-1)] matched
 
    int n = s.size();
    int k = 0;
 
    vector< int >v(n+1);
    v[1] = 0;
 
    for (int i = 2; i <= n; i++) {
        while (k > 0 && s[k]!=s[i-1]) k = v[k];
        if (s[k]==s[i-1]) k++;
        v[i] = k;
    }
 
    return v;
}
 
LL power[MAXN];
LL inverse2;
 
void precompute()
{
    power[0] = 1;
    for (int i = 1; i < MAXN; i++) power[i] = mul(power[i-1], ALPHA);
    inverse2 = pw(2, MOD-2);
}
 
void solve(int m, int n, const string &p)
{
    vector<int>pi = prefixFunction(p);
    vector< vector<int> >pimap(m+1);
    for (int i = 1; i <= m; i++) pimap[pi[i]].push_back(i);
 
    vector<LL>v(m+1, 0);
 
    LL matched = 0;
    for (int i = 0; i <= m; i++) {
        if (i > 0) matched = sum(mul(matched, ALPHA), p[i-1]-'a');
 
        LL bamCount = power[n-i];
        LL bamPash = mul(mul(bamCount, sub(bamCount, 1)), inverse2);
        bamPash = mul(bamPash, power[i]);
 
        v[i] = sum(bamPash, mul(matched, bamCount));
    }
 
    for (int i = 0; i <= m; i++) {
        for (int x : pimap[i]) {
            v[i] = sub(v[i], v[x]);
        }
        cout << v[i] << " ";
    }
    cout << "\n";
}
 
int main()
{
    ios::sync_with_stdio(false);
    cin.tie(0);
 
//    freopen("input17.txt", "r", stdin);
//    freopen("output17.txt", "w", stdout);
 
    precompute();
 
    int t;
    cin >> t;
 
    for (int ti = 1; ti <= t; ti++) {
        int m, n;
        cin >> m >> n;
        string p;
        cin >> p;
        solve(m, n, p);
    }
 
    return 0;
}

Tester's Solution

//teja349
#include <bits/stdc++.h>
#include <vector>
#include <set>
#include <map>
#include <string>
#include <cstdio>
#include <cstdlib>
#include <climits>
#include <utility>
#include <algorithm>
#include <cmath>
#include <queue>
#include <stack>
#include <iomanip>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp> 
//setbase - cout << setbase (16); cout << 100 << endl; Prints 64
//setfill -   cout << setfill ('x') << setw (5); cout << 77 << endl; prints xxx77
//setprecision - cout << setprecision (14) << f << endl; Prints x.xxxx
//cout.precision(x)  cout<<fixed<<val;  // prints x digits after decimal in val
 
using namespace std;
using namespace __gnu_pbds;
 
#define f(i,a,b) for(i=a;i<b;i++)
#define rep(i,n) f(i,0,n)
#define fd(i,a,b) for(i=a;i>=b;i--)
#define pb push_back
#define mp make_pair
#define vi vector< int >
#define vl vector< ll >
#define ss second
#define ff first
#define ll long long
#define pii pair< int,int >
#define pll pair< ll,ll >
#define sz(a) a.size()
#define inf (1000*1000*1000+5)
#define all(a) a.begin(),a.end()
#define tri pair<int,pii>
#define vii vector<pii>
#define vll vector<pll>
#define viii vector<tri>
#define mod (998244353)
#define pqueue priority_queue< int >
#define pdqueue priority_queue< int,vi ,greater< int > >
#define flush fflush(stdout) 
#define primeDEN 727999983
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
 
// find_by_order()  // order_of_key
typedef tree<
int,
null_type,
less<int>,
rb_tree_tag,
tree_order_statistics_node_update>
ordered_set;
 
#define int ll
int pow26[1234567],getans[1234567];
int wow[1234567];
 
const int MAXN=2e5;
 
int lps[MAXN];
int constrlps(string s){
    int i,j;
    lps[0]=0;
    i=1;
    j=0;
    while(i<s.length()){
        if(s[i]==s[j]){
            lps[i]=j+1;
            i++;
            j++;
        }
        else{
            if(j!=0){
                j=lps[j-1];
            }
            else{
                lps[i]=0;
                i++;
            }
        }
    }
    return 0;
}   
 
ll gg[1234567];
 
main(){
    std::ios::sync_with_stdio(false); cin.tie(NULL);
    int t;
    cin>>t;
    int i;
    pow26[0]=1;
    getans[0]=0;
    ll inv2 = 499122177;
    f(i,1,1234567){
        pow26[i]=pow26[i-1]*26;
        pow26[i]%=mod;
        getans[i]=pow26[i]*(pow26[i]-1);
        getans[i]%=mod;
        getans[i]*=inv2;
        getans[i]%=mod;
    }
    while(t--){
        int m,n;
        cin>>m>>n;
        string p;
        cin>>p;
        ll inv26 = 729486258;
        ll ans=0;
        constrlps(p);
        rep(i,m){
            gg[i]=0;
            ans=ans*26+p[i]-'a';
            ans%=mod;
        }   
        gg[m]=0;
        fd(i,m,0){
            //cout<<ans<<endl;
            wow[i]=ans*pow26[n-i]+getans[n-i]*pow26[i];
            wow[i]-=gg[i];
            wow[i]%=mod;
            wow[i]+=mod;
            wow[i]%=mod;
            if(i>0){
                ans-=p[i-1]-'a';
                ans*=inv26;
                ans%=mod;
                ans+=mod;
                ans%=mod;
                gg[lps[i-1]]+=wow[i]+gg[i];
                gg[lps[i-1]]%=mod;
            }
        }
        ll sumi=0;
        rep(i,m+1){
            cout<<wow[i]<<" ";
            sumi+=wow[i];
            sumi%=mod;
        }
        assert(sumi==getans[n]);
        cout<<endl;
 
    }
    return 0;   
}

Feel free to Share your approach, If it differs. Suggestions are always welcomed.

akashbhalotia · June 23, 2019, 7:26pm

Thanks for the fast editorials

huzaifa242 · June 24, 2019, 10:29am

I have visualised this problem in a bit different way.

We are given n, Thus we can see all the strings as numbers of n digits in base 26.

Let us discuss strategy for 1<=i<=m. We’ll consider i=0 afterwards.

for each i, we can think that we need to sum all the numbers whose last i digits are same as first i digits of given string.

we can simply convert string to number in following way:

0000 ----------> aaaa
0001----------->aaab
0002----------->aaac
|
|
000 25-------->aaaz
and so on

Now lets take an example for i==1.
let s[i]=‘a’
Thus we need to sum all the indices which end with ‘a’ or ‘0’ in base 26.

now such first number(in base 26) occurs at index 0 (aaaa → 0000)

next such number occurs at index 26(aaba–> 0010)

next again at index 52 (aaca → 0020)

Each index occurs after interval of 26

Thus we see that they form an Arithmetic Progression with common difference =26.

To be more clear let’s Take example for i=2

Let us assume prefix = “ab” Hence we need to sum all the indices of Numbers(in base 26) whose last digits are “01”

First such number occurs at index 1 (0001 → aaab)

next number occurs at index 677 (0101->abab)

next number at index 1353 (0201->acab)

Here also If we observe The common difference is 676=(26*26)

Hence I Think we all agree on the fact that answer to each prefix of length 1<=i<=m

is the sum of Arithmetic progression where common difference is (26)^i.

Now what remains to determine is the First Term of AP and Number of Terms in AP.

Lets find Number of Terms in AP for 1<=i<=m

So Total Number of Terms where last i digits are fixed is given by (26)^(n-i)

Hence The Number of Terms in Each AP is Governed by formulae (26)^(n-i) for each 1<=i<=m

Now Let’s Find First Term of AP for each 1<=i<=m

It looks Very straight forward. The First Term of AP is the index of First Occurrence of Prefix of length i as suffix. Now Calculating this Index is very easy. Since We have converted each string to base 26, The Index of first Occurrence is simply the to covert base 26 Number to base 10.

For eg. i=2 Prefix =“ab” Then In base 26 ab=(01) So simply convert this 01 in base 26 to base 10. This would Be our First Term in AP.

I think, we are clear with Trivial Cases. But there is one more Observation to be dealt with.

Let’s say for i=3 prefix=“aaa” Now for Such cases, There would be some mistake in counting. because for i=2 our prefix would be “aa” and we would count all the strings ending with “00” in base 26.

But in our counting we would also count the strings that end with “000” which is wrong.

So The only thing we are left with is to Remove Such Duplicating Count.

Here We will take help of Shift Table of KMP algorithm.

Just to Ponder, Shift Table of KMP Algorithm Stores the Length of Longest Prefix That is also Suffix of Current String interval.

So How we would use this Shift Table?

Well Firstly Calculate AP sum for 1<=i<=m
Then Just Subtract from AP_sum[shift_table[i]] the value AP_Sum[i];

=> AP_sum[shift_table[i]] -=AP_sum[i]

Why This Works?

Prefix of Length i will have a Prefix of Length Shift_table[i] That has been wrongly Counted.

Let’s Clear it with our Example of prefix=“aaa” Now as we talked earlier, I=2 will wrongly count some String which have prefix of length of 3. So Using Shift Table, We will Simply Remove These Redundancy.

AP_sum[1]=AP_sum[1]-AP_sum[2]
AP_sum[2]=AP_sum[2]-AP_sum[3]

Thus Now we can Say That all the count of 1<=i<=m are correct and there is No Duplication.

We need to Now Simply Calculate for i=0.

Any Guess?

Yes, Sum all the Values of AP_sum from 1<=i<=m And Find Sum from 0 to ((26)^(n)) -1

So i=0 Count is (Sum from 0 to ((26)^(n)) -1) - Sum of All AP_sum from 1<=i<=m

Link to My Solution Using Above Explained Approach

Feel Free to Ping me If you Have any Doubt In my Approach.