I have been learning about hashing for a while specifically about string hashing and I came across a string hashing algorithm known as djb2. Refer the link: Hash Functions.
The source-code of djb2 hashing algorithm is as follows:
static uint32_t djb2_hash_function(const char *data) {
uint32_t hash = 5381;
char read_char;
while(0 != (read_char = *data++)) {
hash = ((hash << 5) + hash) + read_char;
}
return hash;
}
There are a couple of questions I have:
- What is the significance of the number 5381 can I change it?
- Can you tell me scenarios, where there will most number collisions, will happen?
- Hash function returns a value of type
unsigned 32-bit
that means the maximum value can be 2^{32} - 1 = 4,294,967,295 which is high enough to create a hash-table, so how can I map it to a smaller value to create a hash-table?
E.g. Suppose I have a string that can be of maximum length 5, so there can be 26^5 = 11,881,376 that is around 12 million possibilities that too only in the case of a string which consists of lower-case alphabets [a-z]. If I wanted to create a hash-table to check whether a string has been repeated or not in a data-set, then if I use the above hashing function, it will give me32-bit unsigned
hash values and I cannot create an array size of length 2^{32} - 1 = 4,294,967,295, Is there any solution to this scenario?
Thanks for reading.
Peace