Help in Rabin-karp-algorithm

pakalupapito · May 26, 2020, 1:25pm

I was learning rabin karp algorithm and solving a basic problem of finding indices of pattern in a string with the help of string hashing technique from here But the code is just not printing out anything.(basically unable to match a string)

I used this algorithm yesterday to solve chefship problem and got AC but now I’m unable to find where’s the issue.

here’s my code-

#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
ll mod = 1000000007;
#define max 100005
int p = 31;

ll hashfunction[max], power[max];

void precompute(string text, int n){

    power[0]=1;
    hashfunction[0] = text[0]-'a'+1;
    ll pow = p;
    for(int i=1; i<n; i++){
        power[i] = pow;
        hashfunction[i] = (hashfunction[i-1] + ((text[i]-'a'+1)*pow)%mod)%mod;
        pow = (pow * p)%mod;
    }
}

ll getHash(int l, int r){
    if(l==0) return hashfunction[r];
    else return (mod + hashfunction[r] - (hashfunction[l-1]*power[r-l+1])%mod)%mod;
}

ll computeHashofstring(string pattern){
    ll hash_value = 0;
    for(int i=0; i<pattern.length(); i++){
        hash_value += ((pattern[i]-'a'+1)*power[i])%mod;
    }
    return hash_value;
}

vector<int> solve(string text, string pattern){

    vector<int> v;

    precompute(text, text.length());
    ll hashofpattern = computeHashofstring(pattern);

    int left=0, right=pattern.length()-1;
    while (right<text.length())
    {   
        if(hashofpattern == getHash(left,right)){
            v.push_back(left);
        }
        
        left++,right++;
    }

    return v;
}

int main(){

    string text, pattern;
    cin >> text >> pattern;

    vector<int> v = solve(text, pattern);

    for(int i=0; i<v.size(); i++) cout << v[i] << '\n';

    return 0;
}

Can anyone help?

aayush_84719 · May 26, 2020, 2:40pm

Just two modification to your code :

getHash function
If condition
Have a look : xPGdZ4 - Online C++0x Compiler & Debugging Tool - Ideone.com
Explanation : The hash value by you from [l,r] was incorrect.
Suppose, text – xyabdf, pattern – ab
The hashValueOfPattern = a + b.p
and hashValueOfText[3] = x + y.p + a.p.p + b.p.p.p
We know, the given pattern exist in text at index 2. Thus,
getHash(2,3) == hashValueOfPattern.
But according to your code,
getHash(2,3) = hashValueOfText[3] - hashValueOfText[1].p.p
= x + y.p + a.p.p. + b.p.p.p - x.p.p - y.p.p.p
!= hashValueOfPattern.
Thus giving wrong answer.

pakalupapito · May 26, 2020, 3:12pm

Thank you very much, I got it.

pakalupapito · May 26, 2020, 6:24pm

Man, I need one more help, Actually I was doing one question on this algo but found out that I was unable to find hash value of one character in a string-

For example-
consider the string “abab”
the hash value of this string is a + b.p + a.p.p + b.p.p.p
when I try to find hash value of 3rd character (which must be same as first character) I get the different hash values instead of same.

What to do here? I was doing this problem.

aayush_84719 · May 26, 2020, 7:20pm

Let string s = “abab”
CASE I : Storing hash Values in forward manner.
hashF[1] = a
hashF[2] = a + b.p
hashF[3] = a + b.p + a.p.p
hashF[4] = a + b.p + a.p.p + b.p.p.p
hash Value of any string [l,r] can be calculated as
res_hash = (hashF[r] - hashF[l-1]) * modInverse(p^(l-1))
modInverse is done because you need to divide the ans by p^(l-1).
Example : L=3,R=3,res_hash = (hashF[3]-hashF[2]) * modInverse(p^2)
res_hash = a.p.p * modInverse(p^2) = a [You can proof it]
If you do this, then hashValue of similar substring will be same.

CASE II : Storing hash in backward manner
hashB[4] = b
hashB[3] = a + b.p
hashB[2] = b + a.p + b.p.p
hashB[1] = a + b.p + a.p.p + b.p.p.p
hash Value of any string [l,r] can be calculated as
res_hash = hashB[l] - (hashB[r+1] * p^(r-l+1))
Example : L=3,R=3, res_hash = hashB[3] - (hashB[4] * p^(3-3+1))
res_hash = a + b.p - b.p = a
Thus, by using case II , we do not require modInverse(log(n) complexity)

akram01 · May 27, 2020, 5:57am

The hash value of the 3rd character will not be the same as the 1st character even though both are the same letter ‘a’. That’s what this hashing is all about. It doesn’t just depend on what character it is. It also depends on what position the character is at. We are basically signifying the position of each character by multiplying it with some constant ‘p’ raised to the power that corresponds to its position.

This is done so that the permutations of the same sub-string do not map to the same hash value. Do correct me if I’m wrong. I’m not that sure about this.

pakalupapito · May 27, 2020, 7:00am

Ohhh, Really thanks for the help. That’s why I was thinking why people use backward hashing. Also we can remove the log(n) complexity in case-1 by precomputing the modinverse of every power of p, am I right? But really thanks for the help. I appreciate it.

pakalupapito · May 27, 2020, 7:02am

But then hash value of a substring for example cdf must be same for all of its appearances in a given string like abcdfgcdf the first cdf and second cdf in this string must have a same value.

aayush_84719 · May 27, 2020, 7:07am

Pre-Computation of modInverse is also possible in linear time. But, I prefer backward hashing . You can try any question with both the ways. The result will surely not diifer. Trying both the types is a good habit (not in short contest). Now, you can easily solve the question (in which you got stuck).

pakalupapito · May 27, 2020, 8:26am

Thanks for your help. I understood it, My code is giving wrong answer on 8th test case. Looks like there is collision, I changed value of p from 31 to 53 but it didn’t help. I guess I should leave it here

aayush_84719 · May 27, 2020, 9:18am

I looked at your code. It seems to be correct. The failure is because of collision (probably). I suggest you to set the MOD value in the range 10^15(prime). If it fails then also, Double Hashing is the last option. And, use set instead of map (only for ease of implementation).