CHSTR - Using Perfect Hashing

vikky_codder · June 15, 2015, 5:51pm

Did someone get 100 using hashing in this problem?

raja44ever · June 15, 2015, 6:02pm

I did ,see the solution CodeChef: Practical coding for everyone

priyankpalod · June 15, 2015, 7:04pm

Can Somebody help me understand why my O(n^2) algorithm using hashing gives TLE for the third subtask.
http://www.codechef.com/viewsolution/7211894

My Algo is that I hash all the substrings into a hashTable using polynomial hashing (mod size of the table). I keep an id for all strings which is its value in base 27 (considering ‘a’ as 1 … ‘z’ as 26). I maintain an array setsWithSize which contains the number of different strings which occur as substrings exactly i times at setsWithSize[i].
After hashing all the substrings, I compute the array answ[] which contains the answer for query k at answ[k].

megatron_64 · June 15, 2015, 7:16pm

@vikky_coder , I think the hash function is just a variant of Rolling hash

Rolling hash is basically the following algorithm

hash = (hash * base + s[i]) % MOD

The crucial point in using hashing is to perfectly choose base and MOD

You can use Mod wrt more than one prime for safety

thechamp103 · June 15, 2015, 9:58pm

I did it without hash

http://www.codechef.com/viewsolution/7137470

geek_geek · June 15, 2015, 10:36pm

Well lot of people did using rabin karp and polynomial hashing …

i dont understand how collisions could be avoided.

lets say a%m =x
implies (a + n*m)%m = x where n is any positive integer
the range values obtained will be from 0 to 26^(5000) as 5000 is the size of the string now even if you take a prime number as big as 10^18 (64 bit limit in c++) how can you gaurentee it wont cause collision in a unbiased test cases??

vikky_codder · June 15, 2015, 6:11pm

Can you please explain your hash function? thanks!

thezodiac1994 · June 15, 2015, 6:34pm

I don’t see how this will run without collisions if the test data is strict.

raja44ever · June 15, 2015, 7:10pm

@thezodiac1994 just take hashing base as a big prime no. so that collision can be prevented

raja44ever · June 15, 2015, 7:17pm

@vikky_codder first calculate the hash value for first substring of length i and the remaining can be calculated using this statement
ht = int_mod(ht*b - s[j-i-1]*e + s[j],mod);
…where i+1<=j<=n and b is the base(which is a big prime no) and e=b^i

vikky_codder · June 15, 2015, 9:35pm

I used rolling hash with some combonations of base and MOD, but all I got was WA and in some cases TLE due to perhaps because of large value of MOD. The problem is “how to choose base and MOD for perfect hashing?”.

vikky_codder · June 15, 2015, 9:42pm

@raja44ever thanks

vikky_codder · June 15, 2015, 9:49pm

@priyankpalod how is it O(n^2)? What about h.add() that is being called for all substrings?

vippu95 · June 15, 2015, 9:54pm

@vikky_coder The best approach to do hashing is to use a prime mod say 10^9+3 and two bases say 29 and 59 . Then calculate two hash functions for each string one using base 29 and other using base 59. Two strings are then said to be equal if and only if both the hashes are same for the two strings . It is highly unlikely that this approach will give collisions (at least it never did for me).

priyankpalod · June 15, 2015, 10:21pm

@vikky_codder h.add() just adds the substring in the hashtable and since the hashTable is very large we can safely assume it takes constant time (very less load factor)

thezodiac1994 · June 15, 2015, 10:32pm

That still doesn’t assure that all keys will be distinct.

thezodiac1994 · June 15, 2015, 11:56pm

As the prime is increased, the probability of collisions decreases. I don’t think the standard RK algo is 100 % collision proof but when you use a large hash table, you can do away with “NOT SO STRICT” data. I think thats what has happened here.
PS : You cant use a prime as large as 10^18 because you need to perform multiplications in the hash funcn which will result in overflow.

megatron_64 · June 18, 2015, 12:10pm

Look, you can never guarantee the elimination of collisions. All you can do is insure that collisions are minimum. For that you may use two bases as vippu95 said, or you can use modulo two or more primes. The strings will be equal if the hash value for each modulo is same.