Fast Fourier Transform, bitwise operations.
Given a binary string A of size N and B of size M. For each sub-string x of A having length |B|, you need to find hamming distance
between x and B.
Hamming distance of the two binary string of the same length is the number of the positions where the two strings have
- eg. the hamming distance of “01010101” and “00110111” is 3.
- the hamming distance of a binary string with itself is 0.
Primarily there are three possible solutions depending on the subtasks.
For passing the first subtask, you can directly use brute-force to calculate hamming distance between two strings.
For passing the second subtask, you can directly use brute-force with a small optimization of packing 64 bits into a single
integer which can be stored in unsigned long long data type in C++. Finding hamming distance between two strings x and y can be done
by bitwise operations which are normally executed using gates which takes almost constant CPU cycles.
You can make use of the fact that if you represent the string using polynomials, problem of finding hamming distance can be
converted to multiplying two polynomials which can be done faster using fast fourier transform.
Let us start with subtasks in increasing order of difficulty.
As N and M are quite small, we can try all possible sub-strings x of A having length M and we can find hamming distance between
two string x and B. Finding hamming distance between two strings can be done in O(sum of sizes of strings).
For finding hamming distance between s and t, we can simply iterate over the string s and count the number of positions
where s and t have different characters.
calculateHammingDistance(s, t): ans = 0; for i = 0 to |s|: if s* != t*: ans++; return ans; foreach sub-string x of A: calculateHammingDistance(x, B);
Strings are binary Strings !!!
Let’s make use of the fact that strings we are dealing with are binary strings. Can we somehow use bitwise operations to finding hamming
distance faster. Assume we have to find hamming distance between s and t. Let us pack s into s / 64 groups of integers. Each group
of integer represents 64 bits of s. Now we need to find out a way to evaluate number of bits where two integers differ.
Use bitwise operations
For that we can make use of xor operation. Let us understand xor operation, If the bits are same, then xor is zero, otherwise it is one.
So number of differing bits between x and y will be equal to number of set bits (bits having value equal to 1) of x ^ y
(here ^ represents bitwise xor operation).
Finding set bits in an integer
For finding set bits of an integer, we can assume
__builtin_popcount in c++. Actually there are lot of ways of doing this. You
can have a look this wiki link and this blog.
Se will initially divide the entire string A into A / 64 parts.
Then for each sub-string, we will have M / 64 operations for finding hamming distance. As there are N - M + 1 sub-strings.
So Overall time complexity will be O((N - M + 1) * M / 64) .
Represent strings in terms of polynomials
Now we have to make slightly sophisticated use of polynomials. Let us represent our strings as polynomials.
Note that we will replace 0 with -1 in the string to get a polynomial of degree equal to size of string - 1.
- String 010 will be a polynomial (-1) x^2 + 1 x + -1 = - x^2 + x - 1.
- String 100 will be a polynomial (1) x^2 + -1 x + -1 = x^2 - x - 1
- String 0 will be a polynomial -1.
- String 1 will be a polynomial 1.
What does hamming distance represent in terms of polynomials?
Now let us see how to represent the hamming distance between two strings using coefficients of polynomials.
Take string s and t. Reverse the string t. Represent both the strings as polynomials as described above.
Now take the product of both the polynomials, you can observe the absolute value of coefficient of x^(|s| - 1) will be equal to number of equal bits -
number of different bits. As we know total number of bits, we can exactly find number of equal or different bits easily by
solving two linear equation in two variables.
Let’s take an example
Let s = 010, t = 100.
Initially we reverse the string t. So now t becomes 001.
String 010 will be a polynomial p = - x^2 + x - 1.
and String 011 will be a polynomial q = -1x^2 -1 x + 1.
Now see the coefficient of x^2 in product of both polynomials p and q.
it will be (-1 * 1 + 1 * -1 + -1 * -1) = -1. Absolute value of it will be 1.
Note that if both the bits are equal, then product of those is 1 (as 1 * 1 = 1 and -1 * -1 = 1).
If they are different, then the product will be -1. (As 1 * -1 = -1).
So overall the coefficient will be number of equal bits - number of different bits which in our case is 1.
As we know the number of different bits is precisely hamming distance.
We also know that number of equal bits + hamming distance = 3.
and number of equal bits - hamming distance = 1.
So solving these two equations, we get hamming distance = 2 and number of equal bits = 1.
So we will take strings A and B. Reverse the string B. Represent both the strings as polynomials as described above.
Now instead of iterating over all the sub-strings of A. we will simply multiply the polynomials corresponding to String A and
reverse string of B.
For each sub-string, we can get the answer by looking at the corresponding coefficients. Please look into previous example
for better understanding.
If A = 0100 and B = 100.
Then we need to take care of coefficients of x^3 and x^2. Coefficient of x^3 can be used to find out hamming distance of
010(A[0:2]) and 100(B) and coefficient of x^2 can be used to find out hamming distance of 100(A[1:3]) and 100(B).
Multiplying Two polynomials
We can multiply two polynomial of deg n faster than n^2. We can use fast fourier transform(FFT) to do this in O(n log n) time.
For understanding FFT, you can use the following links.
Usually for FFT, your polynomials should usually be of degrees equal to some power of 2.
As FFT works in n log n where n is sum of degrees of polynomials which we are trying to multiply.
So Overall, it will O(N log N) where N is smallest power of two greater than or equal to size of string A + B.
Problems to Practice
Please add more problems for practice !!