Did someone get 100 using hashing in this problem?

Can Somebody help me understand why my O(n^2) algorithm using hashing gives TLE for the third subtask.

http://www.codechef.com/viewsolution/7211894

My Algo is that I hash all the substrings into a hashTable using polynomial hashing (mod size of the table). I keep an id for all strings which is its value in base 27 (considering ‘a’ as 1 … ‘z’ as 26). I maintain an array setsWithSize which contains the number of different strings which occur as substrings exactly i times at setsWithSize[i].

After hashing all the substrings, I compute the array answ[] which contains the answer for query k at answ[k].

@vikky_coder , I think the hash function is just a variant of Rolling hash

Rolling hash is basically the following algorithm

hash = (hash * base + s[i]) % MOD

The crucial point in using hashing is to perfectly choose ** base ** and ** MOD **

You can use Mod wrt more than one prime for safety

Well lot of people did using rabin karp and polynomial hashing …

i dont understand how collisions could be avoided.

lets say a%m =x

implies (a + n*m)%m = x where n is any positive integer

the range values obtained will be from 0 to 26^(5000) as 5000 is the size of the string now even if you take a prime number as big as 10^18 (64 bit limit in c++) how can you gaurentee it wont cause collision in a unbiased test cases??

Can you please explain your hash function? thanks!

I don’t see how this will run without collisions if the test data is strict.

@vikky_codder first calculate the hash value for first substring of length i and the remaining can be calculated using this statement

ht = int_mod(ht*b - s[j-i-1]*e + s[j],mod);

…where i+1<=j<=n and b is the base(which is a big prime no) and e=b^i

I used rolling hash with some combonations of base and MOD, but all I got was WA and in some cases TLE due to perhaps because of large value of MOD. The problem is “how to choose base and MOD for perfect hashing?”.

@vikky_coder The best approach to do hashing is to use a prime mod say 10^9+3 and two bases say 29 and 59 . Then calculate two hash functions for each string one using base 29 and other using base 59. Two strings are then said to be equal if and only if both the hashes are same for the two strings . It is highly unlikely that this approach will give collisions (at least it never did for me).

@vikky_codder h.add() just adds the substring in the hashtable and since the hashTable is very large we can safely assume it takes constant time (very less load factor)

That still doesn’t assure that all keys will be distinct.

As the prime is increased, the probability of collisions decreases. I don’t think the standard RK algo is 100 % collision proof but when you use a large hash table, you can do away with “NOT SO STRICT” data. I think thats what has happened here.

PS : You cant use a prime as large as 10^18 because you need to perform multiplications in the hash funcn which will result in overflow.

Look, you can never guarantee the elimination of collisions. All you can do is insure that collisions are minimum. For that you may use two bases as vippu95 said, or you can use modulo two or more primes. The strings will be equal if the hash value for each modulo is same.