PROBLEM LINK:
Setter: Mohammad Solaiman
Tester: Teja Vardhan Reddy
Editorialist: Teja Vardhan Reddy
DIFFICULTY:
PREREQUISITES:
knuth morris pratt algorithm (kmp), dynamic programming.
PROBLEM:
You will be given a string P of length M consisting of lowercase characters.
Consider all possible strings of length N consisting of lowercase characters. There are 26^N possible such strings. Let’s enumerate them lexicographically starting from 0. For i^{th} string where 0 \leq i \leq 26^N-1, let C_i be the length of the longest prefix of P which is also a suffix of this string.
The value of C_i can be 0 to M. For each possible value k (0 \leq k \leq M), find the sum of the indices i of the strings for which C_i will be equal to this value (let us represent this value by ans[k]).
EXPLANATION
Let us assume we have lps (longest prefix suffix) array of string P constructed (lps array is a common term from kmp algorithm).
Let us represent i length prefix of P with pref_i
Let us assume each prefix(including empty prefix) of a string P is represented by a node.Let us represent prefix of length i with vertex (or node) with label i. Now, we will have n+1 nodes.
Note that lps[0](lps of empty string) is not defined.
Let us construct a tree such that we make parent of vertex i, its lps[i]. vertex 0 will be root of the tree since its lps is not defined for it and it does not have parent while rest all have unique parent.Also,you can easily convince yourself that it will indeed be tree because parent label for a vertex will always be smaller than vertex label. Hence, cycles cannot be formed.
let us define lps_k[i] = lps_{k-1}[lps[i]] if (k \geq 1)
lps_0[i] = i.
lps_k[i] represents k th parent of node i in the tree formed above.
Claim 1: If a string s is a suffix of string p and p is suffix of string t, then s is also a suffix of t. (length(s)<=length(p)<= length(t))
Proof:
Last length(p) characters of p and t are same.
Hence,Last length(s) characters of s and p are same
Hence, last length(s) characters of s and t are same (since length(s)<=length(p)).
Hence, s is also a suffix of t.
Corollary 1: We can see that vertex lps_x[j] for valid x>=0 is always a suffix of vertex j because vertex lps[j] is suffix of vertex j.And vertex lps[lps[j]] is suffix of vertex lps[j]. And applying above claim here, we get vertex lps_2[j] is a suffix of vertex j. Similar we can extend it further.
Claim 2: If a string s is a suffix of string t and p is suffix of string t and (length(s)<=length(p)<= length(t)), then s is suffix of string p also.
Proof:
Last length(p) characters of p and t are same.
Hence,Last length(s) characters of p and t are same (since length(s)<=length(p)).
Last length(s) characters of s and t are same.
Hence, last length(s) characters of s and p are also same. Hence s is suffix of p also.
Claim 3:pref_i is a proper suffix of pref_j (i \lt j) if and only if for some x,lps_x[j]=i.
Proof for forward direction:
Proof by contradiction.
Assume for no value of x, lps_x[j]=i.
We know that for some value of x, lps_x[j] = 0. since pref_0 is root of the tree.
Now, first we will note that lps_x[j] will strictly decrease with x.
Since lps_0[j]=j (\gt i). There exist a value y such that lps_y[j]\lt i \lt lps_{y-1}[j].
Now since pref_{lps_{y-1}[j]} is suffix of pref_j and pref_i is a suffix of pref_j. From claim 2 , i is a suffix of lps_{y-1}[j] and since the lengths are not same, pref_i will be proper suffix of vertex pref_{lps_{y-1}[j]}. Now, since we know that pref_i is a prefix of pref_{lps_{y-1}[j]}. Now i \leq lps_{lps_{y-1}[j]}
(= lps_y[j]) because lps_y[j] is the longest prefix suffix of lps_{y-1}[j] which is a contradiction.
Proof for backward direction is exactly corollary 1.
So now, pref_i is suffix of all of the vertices in its subtree and not any other vertices.
Index of length n string can be obtained as \sum_{i=0}^{n-1} (p[i]-a)*26^{n-1-i}. We can see it as contribution for each position in the string. We can see that contribution from ith character is (p[i]-a)*26^{n-1-i}.
Firstly we will solve a easier problem.
Sum of indices of strings of length n such that pref_i is a suffix of it.
To solve this. we can see that all the strings with last i letters exactly same as pref_i will satisfy and these are the only strings that satisfy. So now, number of such strings will be 26^{n-i} because rest (n-i) positions can be anything. Also, suffix of all of them will be same. So contribution from last i letters will be (\sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j}) in one string since they are equal to pref_i. Across all the 26^{n-i} strings will be equal to (\sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j})*(26^{n-i}).
And now, lets try to find the contribution from first n-i letters across all the strings,Let us fix the first character, then rest have 26^{n-i-1} choices. So total contribution from first character will be 26^{n-i-1}*(\sum_{ch=a}^{z}(ch-a)*26^{n-1}).
Now similarly contribution from j ( \leq n-i) th character will be 26^{n-i-1}*(\sum_{ch=a}^{z}(ch-a)*26^{n-j}) = 26^{2*n-j-i}*25/2.
Now to get contribution from first n-i characters, we need \sum_{j=1}^{n-i}26^{2*n-j-i}*25/2 = 26^n*(26^{n-i}-1)/2. (there is an easier double counting idea also which I feel is hard to explain here. so lets stick to this proof.)
So now sum of indices of strings with pref_i are suffix will be 26^n*(26^{n-i}-1)/2 +(\sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j})*(26^{n-i}). We can precompute \sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j} for all i using dynamic programming in a similar way to how we compute hashes of a string. So once we do the precomputation and also precompute first n powers of 26, we can get 26^n*(26^{n-i}-1)/2 +(\sum_{j=0}^{i-1}(p[j]-a)*26^{i-1-j})*(26^{n-i}) (lets call this foo[i]) in O(1).
Now, required problem differs to the problem we solved only that pref_i is longest prefix which is suffix of P in required.
We can do a easy modification to achieve that. So now, we have also taken strings where some pref_j is longest suffix and pref_i is also suffix (j \gt i). In such cases, pref_i will be suffix of pref_j. So now if we can subtract from foo[i] the sum of indices such that pref_j is longest prefix and pref_i is suffix of pref_j across all j then we will get the required answer.
We have already proved that for vertices in subtree of vertex i, pref_i will be their suffix. Now , since we want (j>i), we only consider proper subtree of i (i.e subtree of i other than i).
So now, we need to iterate over the proper subtree of vertex i (other than i everything else in the subtree). And compute the sum of ans[j] for all vertex j which are in proper subtree of i and subtract that from foo[i] to get ans[i]. This is a standard tree dynamic programming problem.
We can represent dp[i] as sum of ans values of all vertices in subtree of j. Now we can do dfs over the tree and compute ans and dp values for each vertex using their children’s ans and dp values. Hence, this will take O(m) time.
TIME COMPLEXITY
Finding lps array takes O(m) time.
Complexity of dfs is O(m).
Hence, total complexity is O(m)
SOLUTIONS:
Setter's Solution
#include<bits/stdc++.h>
using namespace std;
typedef long long LL;
const LL MOD = 998244353;
//const LL MAXM = 100007;
const LL MAXN = 1000007;
const LL ALPHA = 26;
LL sum(LL a, LL b)
{
return a+b>=MOD?a+b-MOD:a+b;
}
LL sub(LL a, LL b)
{
return a-b<0?a-b+MOD:a-b;
}
LL mul(LL a, LL b)
{
return (a*b)%MOD;
}
LL pw(LL a, LL b)
{
if (b==0) return 1;
LL r = pw(a, b/2);
r = mul(r, r);
if (b%2) r = mul(r, a);
return r;
}
vector< int >prefixFunction(const string& s)
{
/// everything 1-indexed
/// v[i] = 0 -> empty string matched
/// v[i] = k -> prefix s[0..(k-1)] matched
int n = s.size();
int k = 0;
vector< int >v(n+1);
v[1] = 0;
for (int i = 2; i <= n; i++) {
while (k > 0 && s[k]!=s[i-1]) k = v[k];
if (s[k]==s[i-1]) k++;
v[i] = k;
}
return v;
}
LL power[MAXN];
LL inverse2;
void precompute()
{
power[0] = 1;
for (int i = 1; i < MAXN; i++) power[i] = mul(power[i-1], ALPHA);
inverse2 = pw(2, MOD-2);
}
void solve(int m, int n, const string &p)
{
vector<int>pi = prefixFunction(p);
vector< vector<int> >pimap(m+1);
for (int i = 1; i <= m; i++) pimap[pi[i]].push_back(i);
vector<LL>v(m+1, 0);
LL matched = 0;
for (int i = 0; i <= m; i++) {
if (i > 0) matched = sum(mul(matched, ALPHA), p[i-1]-'a');
LL bamCount = power[n-i];
LL bamPash = mul(mul(bamCount, sub(bamCount, 1)), inverse2);
bamPash = mul(bamPash, power[i]);
v[i] = sum(bamPash, mul(matched, bamCount));
}
for (int i = 0; i <= m; i++) {
for (int x : pimap[i]) {
v[i] = sub(v[i], v[x]);
}
cout << v[i] << " ";
}
cout << "\n";
}
int main()
{
ios::sync_with_stdio(false);
cin.tie(0);
// freopen("input17.txt", "r", stdin);
// freopen("output17.txt", "w", stdout);
precompute();
int t;
cin >> t;
for (int ti = 1; ti <= t; ti++) {
int m, n;
cin >> m >> n;
string p;
cin >> p;
solve(m, n, p);
}
return 0;
}
Tester's Solution
//teja349
#include <bits/stdc++.h>
#include <vector>
#include <set>
#include <map>
#include <string>
#include <cstdio>
#include <cstdlib>
#include <climits>
#include <utility>
#include <algorithm>
#include <cmath>
#include <queue>
#include <stack>
#include <iomanip>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
//setbase - cout << setbase (16); cout << 100 << endl; Prints 64
//setfill - cout << setfill ('x') << setw (5); cout << 77 << endl; prints xxx77
//setprecision - cout << setprecision (14) << f << endl; Prints x.xxxx
//cout.precision(x) cout<<fixed<<val; // prints x digits after decimal in val
using namespace std;
using namespace __gnu_pbds;
#define f(i,a,b) for(i=a;i<b;i++)
#define rep(i,n) f(i,0,n)
#define fd(i,a,b) for(i=a;i>=b;i--)
#define pb push_back
#define mp make_pair
#define vi vector< int >
#define vl vector< ll >
#define ss second
#define ff first
#define ll long long
#define pii pair< int,int >
#define pll pair< ll,ll >
#define sz(a) a.size()
#define inf (1000*1000*1000+5)
#define all(a) a.begin(),a.end()
#define tri pair<int,pii>
#define vii vector<pii>
#define vll vector<pll>
#define viii vector<tri>
#define mod (998244353)
#define pqueue priority_queue< int >
#define pdqueue priority_queue< int,vi ,greater< int > >
#define flush fflush(stdout)
#define primeDEN 727999983
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
// find_by_order() // order_of_key
typedef tree<
int,
null_type,
less<int>,
rb_tree_tag,
tree_order_statistics_node_update>
ordered_set;
#define int ll
int pow26[1234567],getans[1234567];
int wow[1234567];
const int MAXN=2e5;
int lps[MAXN];
int constrlps(string s){
int i,j;
lps[0]=0;
i=1;
j=0;
while(i<s.length()){
if(s[i]==s[j]){
lps[i]=j+1;
i++;
j++;
}
else{
if(j!=0){
j=lps[j-1];
}
else{
lps[i]=0;
i++;
}
}
}
return 0;
}
ll gg[1234567];
main(){
std::ios::sync_with_stdio(false); cin.tie(NULL);
int t;
cin>>t;
int i;
pow26[0]=1;
getans[0]=0;
ll inv2 = 499122177;
f(i,1,1234567){
pow26[i]=pow26[i-1]*26;
pow26[i]%=mod;
getans[i]=pow26[i]*(pow26[i]-1);
getans[i]%=mod;
getans[i]*=inv2;
getans[i]%=mod;
}
while(t--){
int m,n;
cin>>m>>n;
string p;
cin>>p;
ll inv26 = 729486258;
ll ans=0;
constrlps(p);
rep(i,m){
gg[i]=0;
ans=ans*26+p[i]-'a';
ans%=mod;
}
gg[m]=0;
fd(i,m,0){
//cout<<ans<<endl;
wow[i]=ans*pow26[n-i]+getans[n-i]*pow26[i];
wow[i]-=gg[i];
wow[i]%=mod;
wow[i]+=mod;
wow[i]%=mod;
if(i>0){
ans-=p[i-1]-'a';
ans*=inv26;
ans%=mod;
ans+=mod;
ans%=mod;
gg[lps[i-1]]+=wow[i]+gg[i];
gg[lps[i-1]]%=mod;
}
}
ll sumi=0;
rep(i,m+1){
cout<<wow[i]<<" ";
sumi+=wow[i];
sumi%=mod;
}
assert(sumi==getans[n]);
cout<<endl;
}
return 0;
}
Feel free to Share your approach, If it differs. Suggestions are always welcomed.