SHIFTPAL- Editorial

easy
hashing
kingofnumbers
ltime60
string
xellos0

#1

PROBLEM LINK:

Div1
Div2
Practice

Setter- Hasan Jaddouh
Tester- Jakub Safin
Editorialist- Abhishek Pandey

DIFFICULTY:

EASY

PRE-REQUISITES:

Hashing (or precisely, Identifying Palindromic Sub-string using Hashing)

PROBLEM:

Given a string S , tell how many k-shifts of the string are palindromic. k-shift of a string is, the string obtained on shifting every character right by k positions (circularly).

QUICK EXPLANATION:

We know that we can obtain all cyclic shifts of an array by appending an array to itself and moving the N-sized window over it (to get different cyclic shifts). We use this to avoid having to cyclically shift the string again and again. After that, we use the concept of hashing to check if the strings are palindrome or not.

EXPLANATION:

This editorial will have two parts. The first one will describe the cyclic-shift trick, and the other will describe the hashing part. It is expected that you have gone through the pre-requisites if concept of hashing is new to you. The basics of identifying palindrome string through hashing is given in the link under pre-requisites. Hence, the editorial will not explicitly describe on how to check that.

1.Cyclically Shifting Trick-

Whenever we are given the task to cyclically shift an array, or a string, we can use this trick to avoid having to actually cyclically shift the entire array again and again.

This trick says that, append the string/array to itself. Eg, if S=abc then S'=S+S=abcabc. Now, consider the string from range i to i+N to obtain the i'th cyclic shift of the string.

Eg-

alt text

Starting from i=1 to N (1-based indexing) will give you left cyclic shift of string, while decrementing from i=N to i=1 will give you right cyclic shifts of string. Now, I have one question to ask.

Why does this work? What is the foundation principle of this trick?

(Answer will be in Chef Vijju’s corner)

2. Hashing the String-

You can refer to the setter’s code here for the concept for now. Please note that the hash function and operations in it can vary from person to person. We will be discussing w.r.t. setter’s solution first.

Now, we know the cyclic-shift trick. Using that, we can reduce the problem to "Find how many substrings (contiguous) of length N are palindrome in the string S'(=S+S)"

If you’ve read the pre-requisite link, you would have got how we used the hash function of-

h(x)= \sum_{i=0}^{i=n-1} ({101}^{i}*S_i)\%mod where S_i= i’th character of the string.

Setter has used a similar hashing function of-

H(X)= \sum_{i=0}^{i=n-1} ({31}^{i}*(S_i-`a'))\%mod

The steps to check for palindrome are similar. At each step, he checks for a sub-string of length N, modifies the hash function according to what characters are present in the N length window. This step involves, w.r.t. variable hsh in his solution, removal of term with lowest power of {31}^{i} and adding the term {31}^{i}*(S_i-`a') to the function. (If you cannot make sense of this addition removal of lowest/highest power term, please refer again to link in pre-requisites). Can you try working out for hsh2 of setter’s code? Answer will be in the tab below-

Click to view

Don’t get intimidated by the hsh2*=p[2] thing, its nothing but a correction factor. What happens at hsh2 is completely opposite of what happens at hsh , i.e. the highest power (of p) term gets cut-off and the immediate lowest available term is added. Eg- If N=5 ,then we have calculated hashes for terms in range i=0 to i=4. We multiply it by {31}^{2} (making the terms with us having powers between 2 to 6) and remove the term with highest power of {31} (i.e. {31}^{6}). We then add the term with {31}^{1} and check for equality.

Now, what does this computation on hashing actually mean? At each step, hsh is modified to represent the string shifted left (i+1) times, while hsh2 represents the reverse of string S being cyclically shifted right (i+1) times. Compare them in the table below-

alt text

Can you actually tell the significance of {31}^{i} in the hash function? As in, why should we take any constant, say p, and make hash function of type H(X)=\sum_{i=0}^{i=n} {p}^{i}*S_i. Why multiply with {p}^{i} ?

The tester uses a similar concept, except that he tries to make his hash function even less error prone by introducing another prime number for \% operation and making hash as H(X)=H_1*p+H_2 where $H_1$and H_2 and calculated similarly as described above. The tester explicitly uses our cyclic-shift trick. His H represents the normal string S, which H_i represents the reverse of string S. If hashes of these two are equal, then that means that the N length sub-string, and its reverse are equal, which means that the string is palindrome. Hashes are updated in a manner, which is similar to “Sliding window”, i.e. add contribution of new element, remove contribution of the oldest element (element to be removed).

A more formal description of tester’s H and H_i function will be-

H= \sum_{i=0}^{i=2N-1} {p}^ {i}*(S_i-`a ' + 1) for both H_1 and H_2 and

H_i = \sum_{i=2N-1}^{i=0} {p}^{i}*(S_i-`a'+1), again, for both H_{i1} and H_{i2}

In a gist, we “represent” the strings by hash function. If we find that the hash function of the string, and the reverse is equal, it means that the string is palindrome. The only thing to take care of is, your hash function must be good, and not have a high probability of error. Usually its acceptable if its probability of failing is \le 0.001 \%

SOLUTION:

For immediate availability of setter and tester’s solution, they are also pasted in the tabs below. This is for your reference, and you can copy code from there to wherever you are comfortable reading them.

Setter

Click to view
#include <iostream>
#include <algorithm>
#include <string>
#include <assert.h>
using namespace std;
 
 
long long readInt(long long l,long long r,char endd){
	long long x=0;
	int cnt=0;
	int fi=-1;
	bool is_neg=false;
	while(true){
		char g=getchar();
		if(g=='-'){
			assert(fi==-1);
			is_neg=true;
			continue;
		}
		if('0'<=g && g<='9'){
			x*=10;
			x+=g-'0';
			if(cnt==0){
				fi=g-'0';
			}
			cnt++;
			assert(fi!=0 || cnt==1);
			assert(fi!=0 || is_neg==false);
			assert(!(cnt>19 || ( cnt==19 && fi>1) ));
		} else if(g==endd){
			assert(cnt>0);
			if(is_neg){
				x= -x;
			}
			assert(l<=x && x<=r);
			return x;
		} else {
			assert(false);
		}
	}
}
string readString(int l,int r,char endd){
	string ret="";
	int cnt=0;
	while(true){
		char g=getchar();
		assert(g!=-1);
		if(g==endd){
			break;
		}
		cnt++;
		ret+=g;
	}
	assert(l<=cnt && cnt<=r);
	return ret;
}
long long readIntSp(long long l,long long r){
	return readInt(l,r,' ');
}
long long readIntLn(long long l,long long r){
	return readInt(l,r,'

‘);
}
string readStringLn(int l,int r){
return readString(l,r,’
‘);
}
string readStringSp(int l,int r){
return readString(l,r,’ ');
}

int T;
int n;
int sm_n=0;
string s,s2;
 
 
int b=31;
long long mod=1000000007;
long long p2[400400];
/*long long hsh[400400],hsh2[400400];
void hash_init(){
	for(int i=1;i<=n;i++){
		hsh* = (hsh[i-1] + (s* - 'a') * p2*) % mod;
		hsh2* = (hsh2[i-1] + (s2* - 'a') * p2*) % mod;
	}
}
 
long long get_hsh(int l,int r){
	return ((((hsh[r] - hsh[l-1]) * p2[n-l])% mod)+mod)%mod;
}
 
long long get_hsh2(int l,int r){
	return ((((hsh2[r] - hsh2[l-1]) * p2[n-l])% mod)+mod)%mod;
}
long long is_pal(int l,int r){
	return get_hsh(l,r) == get_hsh2(n-r+1,n-l+1);
}*/
int main(){
	//freopen("05.txt","r",stdin);
	//freopen("05o.txt","w",stdout);
	p2[0]=1;
	for(int i=1;i<400400;i++){
		p2*= (p2[i-1] * b)%mod;
	}
	T=readIntLn(1,1000);
	while(T--){
		s=readStringLn(1,200000);
		n= s.length();
		sm_n += n;
		for(int i=0;i<n;i++){
			assert('a'<=s* && s*<='z');
		}
		s2=s;
		reverse(s2.begin(),s2.end());
		
		int sol=0;
		long long hsh=0,hsh2=0;
		for(int i=0;i<n;i++){
			hsh += p2* * (s* - 'a');
			hsh %= mod;
			hsh2 += p2* *(s2* - 'a');
			hsh2 %= mod;
		}
		for(int i=0;i<n;i++){
			hsh -= p2* * ( s* - 'a');
			hsh += p2[i+n] * ( s* - 'a');
			hsh %= mod;
			if(hsh < 0 ) hsh += mod;
			hsh2 *= p2[2];
			hsh2 %= mod;
			hsh2 -= p2[n+1+i] * (s2[n-1-i] - 'a');
			hsh2 += p2[i+1] * (s2[n-1-i] - 'a');
			hsh2 %= mod;
			if(hsh2 < 0 ) hsh2 += mod;
			if(hsh  == hsh2)sol++;
		}
		cout<<sol<<endl;
	}
	assert(getchar()==-1);
} 

Tester

Click to view
#include <bits/stdc++.h>
// iostream is too mainstream
#include <cstdio>
// bitch please
#include <iostream>
#include <algorithm>
#include <cstdlib>
#include <vector>
#include <set>
#include <map>
#include <queue>
#include <stack>
#include <list>
#include <cmath>
#include <iomanip>
#include <time.h>
#define dibs reserve
#define OVER9000 1234567890
#define ALL_THE(CAKE,LIE) for(auto LIE =CAKE.begin(); LIE != CAKE.end(); LIE++)
#define tisic 47
#define soclose 1e-8
#define chocolate win
// so much chocolate
#define patkan 9
#define ff first
#define ss second
#define abs(x) (((x) < 0)?-(x):(x))
#define uint unsigned int
#define dbl long double
#define pi 3.14159265358979323846
using namespace std;
// mylittledoge
 
using cat = long long;
 
#ifdef DONLINE_JUDGE
	// palindromic tree is better than splay tree!
	#define lld I64d
#endif
 
const cat mod[2] = {1'000'000'007, 1'000'000'009};
const cat p = 999983;
 
int main() {
	cin.sync_with_stdio(0);
	cin.tie(0);
	cout << fixed << setprecision(10);
	int T;
	cin >> T;
	for(int t = 0; t < T; t++) {
		string S;
		cin >> S;
		int N = S.length();
		S = S + S;
 
		vector<cat> H[2], Hi[2];
		vector<cat> pw[2];
		for(int k = 0; k < 2; k++) {
			H[k].resize(2*N+1, 0);
			Hi[k].resize(2*N+1, 0);
			for(int i = 0; i < 2*N; i++)
				H[k][i+1] = (H[k]* * p + S*-'a'+1) % mod[k];
			for(int i = 2*N-1; i >= 0; i--)
				Hi[k]* = (Hi[k][i+1] * p + S*-'a'+1) % mod[k];
			pw[k].resize(2*N+1, 1);
			for(int i = 1; i <= 2*N; i++) pw[k]* = (pw[k][i-1] * p) % mod[k];
		}
 
		int ans = 0;
		for(int i = 0; i < N; i++) {
			cat hash_direct[2], hash_inverse[2];
			for(int k = 0; k < 2; k++) {
				hash_direct[k] = (H[k][i+N] - pw[k][N] * H[k]*) % mod[k];
				if(hash_direct[k] < 0) hash_direct[k] += mod[k];
				hash_inverse[k] = (Hi[k]* - pw[k][N] * Hi[k][i+N]) % mod[k];
				if(hash_inverse[k] < 0) hash_inverse[k] += mod[k];
			}
			cat hd = hash_direct[0] * mod[1] + hash_direct[1];
			cat hi = hash_inverse[0] * mod[1] + hash_inverse[1];
			if(hd == hi) ans++;
		}
		cout << ans << "

";
}
return 0;}

// look at my code
// my code is amazing

Editorialist’s Solution

Time Complexity= O(N) for setter and tester’s solution.

CHEF VIJJU’S CORNER :smiley:

1. Foundation of cyclic-shift principle-

Click to view

This principle works mainly because the shift is cyclic, i.e. the last character warps to first character’s place for right shift and vice-versa for left shift. Appending the array takes care of this beautifully.

Talking with respect to left shift - Say we left shifted the string by 1 character. This means,we have to look at characters from i=1 to N (0-based indexing) in our String S'(=S+S). The character at index 0, is already at index N, the sub-string starts from required character, and relative order of every other character is preserved, so we need not care about them

For now, I am tending to leave the formal proof for you guys to google out. Ping me if its not found.

2. Significance of {p}^{i} in hash function.

Click to view

This {p}^{i} assigns an importance/a weight to position where a character occurs. If we simply use \sum_{i=0}^{i=2N-1}(S_i- ` a '+1), then it will be equal to hash of all strings where “same characters occur, irrsepective of order”. Meaning, for this has function, string “aab” and “baa” will be ‘same’.

Recall the concept of Hash functions- If the hash function is ideal, then hashes of 2 strings will be equal if and only if both the strings are equal.

3. A view on Hash functions-

Click to view

As we all know, hashing is never 100\% accurate. Most of the hash functions have a set of input where they may fail. Can you briefly analyze the setter and tester’s hash function? Can we make a test case against it? Why or Why not?

Hashing says that any hash function can “do the job”, but some hash functions can do the job much better than others. It takes a little amount of experimentation (and risk-taking!) to decide what is the perfect hash function for your input data. While hashing may fail at some input out of infinite inputs in the domain, it is possible to get a hash function completely accurate over the set of limited input (Eg- limited test cases). Theres only one thing to watch out for…if the setter hates hashing and has some anti-hash test cases against your popularly-used hash function! Hence, its often good to grasp the concept and make one of your own hash function rather than using the popular ones each time.

4. Regarding any doubts which might arise after reading editorial :slight_smile:

Click to view

I am well aware that there are/will-still-be a few doubts related to hashing right now, especially if you are new. I’d say, take some time, try to grasp the pre-requisite link first, and then re-read the editorial. And if that doesnt answer, immediately ask so we can address it. All such discussions will be added and compiled in editorial to help enhance it, and to help people with similar doubts :slight_smile:

One thing I’d address is, the aim of the editorial is that after reading it, at least one of the approaches (of tester, or of setter) is clear to you. The mathematics behind the hash function may take some time, but your aim should be in seeing how you can make a hash function yourself than copying theirs. Tester’s hash function is easier to understand, while there are some references available for concept of setter’s hash function. If you are able to get the physical meaning, and what they are doing/intending to do, thats good enough for first few tries. The mathematics to understand what does what, how to manipulate the hash function &etc. comes with practice, so dont panic if thats not immediately clear after reading the editorial.

5. We did have a look at hashing solution of this problem. Does a solution without hashing exist? Its your task to find out! Check out the list of successful submissions :slight_smile: . (I will update this section if there I am able to find a good solution without hashing).

6. An alternate perspective for understanding hash function.

Click to view

You can correlate hash-function with polynomials-

H(X)= \sum_{i=0}^{i=2n-1} {P}^{i}*{X_i}^{i+1}

At each iteration, starting from i=0, we take N continuous terms of this polynomial. X_i is nothing but the “value we assigned to the character ch (eg- ch-‘a’+1 etc.)”. We know how quickly {P}^{i} and {X}^{i+1} will grow. This means, it emphasizes/strengthens the fact that, two hashes can be equal if and only if the strings are exactly same.

Think a bit, say I have 2 strings S and S' of length 100. Say, both are same, except that S' = swapping positions of any two characters in S. What is the maximum and minimum hash which we can obtain? Can it collide with hash of S?

7. Tester’s Notes-

Click to view

The solution for SHIFTPAL is just “duplicate the string, then check for each substring of length N is a palindrome”; checking if a substring is a palindrome is trivial with hashing.

8. Some related problems on hashing-


#2

Please help me https://www.codechef.com/viewsolution/18671686 .
https://www.codechef.com/viewsolution/18671710 . .

Upd - Solved https://www.codechef.com/viewsolution/19423064 .


#3

I solved this problem using Manacher’s Algorithm. First I doubled the string and then used Manacher’s Algorithm on the new string, so I only need to find out how many locations have a palindrome radius greater than length.
And here is my solution:
https://www.codechef.com/viewsolution/18664253


#4

Please tell why am I getting WA

https://www.codechef.com/viewsolution/18693156

Calulating cyclic shift strings using substring and checking with palindrome using reverse()


#5

This view content is excellent :smiley:


#6

If the characters are same, then the answer is equal to the length of string (e.g if s = “aaaa”, answer is 4)

However, can the answer ever be greater than 2 if all characters aren’t same?

If yes, can someone please provide an example. Thanks in advance.


#7
Click to view

#include
#include<bits/stdc++.h>
typedef long long int lli;
using namespace std;
lli pow2(lli p){
return 1 << p;
}
int main()
{
lli t,n;string s;
cin>>t;lli count=0;
for(lli tt=0;tt<t;tt++)
{
cin>>s;
n=s.length();
s=s+s;
count=0;
lli sum1=0,sum2=0;int k=2;
for(lli i=0;i<n;i++)
{
sum1+=(s*-‘a’+1)*(lli)(pow2(i));

        sum2+=(s*-'a'+1)*(lli)(pow2(n-1-i));
    }
 
    if(sum1==sum2)count++;
    for(lli i=1;i<n;i++)
    {
        lli j=i+n-1;
        sum1-=(s[i-1]-'a'+1);
        sum1/=k;
        sum1+=(s[j]-'a'+1)*(lli)(pow2(n-1));
        sum2-=(s[i-1]-'a'+1)*(lli)(pow2(n-1));
        sum2*=k;
        sum2+=(s[j]-'a'+1);
        if(sum1==sum2)count++;
    }
    cout<<count<<"

";
}
return 0;
}

can some one help me…thx in advance!!


#8

why you are writing so many #define in code?
what is logic behind this?

Tester’s solution


#9

@vijju123

How to deal with the problem of rounding of powers of ‘p’ when the power value crosses mod value?

As in, when we are comparing hash and reverse hash, one hash is multiple of other hash(as given in prerequisite). But that holds true only if the power values round up at the same point for given string.

e.g consider string s=abbbbbba.
ns = s+s = abbbbbbaabbbbbba

now if string under consideration is [2 7].
Suppose that while computing prefix array, power value rounds off at index 5.
And suppose that while computing suffix array, power value rounds off at index 7.

So, the hash[2 7] using prefix array CANT BE EQUAL to hash[2 7] using suffix array.

How to deal with this problem?


#10

Here is my clean hash solution : https://www.codechef.com/viewsolution/21761423
Manacher’s Algorithm Solution : https://www.codechef.com/viewsolution/21760228


#11

I didn’t read through your code very carefully, but it seems that you haven’t accounted for the case when subtracting the two prefix sums, or two suffix sums makes it negative. You should add MOD to it in that case.


#12

You can check out my code here BTW I think I have written it quite clearly http://ix.io/1bvW


#13

My method of subtraction.

y=(a[1][k][2*l-i]-a[1][k][l-i]);
y%=mod[k];
y=(y+mod[k])%mod[k];


#14

A nice alternative approach :slight_smile:


#15

Your code failed on sample case itself. What is the logic behind your hash function?


#16

Unable to format my comment.

This question was my first attempt with Hash. Yes, it also fails on sample test case.

m = some base. I took 26,101,1001 etc. But failed on all. There 3 for mod. All stored in modular airthmetic Hash for prefix s[0] s[0]*m+s[1] s[0]*m2+s[1]*m+s[2] s[0]*m3+s[1]*m2+s[2]*m+s[3]+....

similarly for s[2*l] s[2*l]*m+s[2*l-1] s[2*l]*m2+s[2*l-1]*m+s[2*l-2] s[2*l]*m3+s[2*l-1]*m2+s[2*l-2]*m+s[2*l-3]+....

for substring [i,j]

hash value (prefix[j]-prefix*)*(modInv(m))^i //here i found error while writing this. Will try again and get back.


#17

cool, that’s really thinking out of the box :slight_smile:


#18

I formatted it for you. Time for a comfortable reading now :smiley:


#19

I have a doubt.
If the function STRREV would have been available will this much of headache coding still be required?


#20

Nice editorials @vijju123. I wish you could be editorialist for all rated contests :slight_smile: