×

# TASTRMAT - Editorial

Author: Tuan Anh
Tester: Gedi Zheng
Editorialist: Praveen Dhinwa

Hard

# PREREQUISITES:

Fast Fourier Transform, bitwise operations.

# PROBLEM:

Given a binary string A of size N and B of size M. For each sub-string x of A having length |B|, you need to find hamming distance between x and B.

Hamming distance of the two binary string of the same length is the number of the positions where the two strings have different characters.

• eg. the hamming distance of "01010101" and "00110111" is 3.
• the hamming distance of a binary string with itself is 0.

# QUICK EXPLANATION:

Primarily there are three possible solutions depending on the subtasks.

• For passing the first subtask, you can directly use brute-force to calculate hamming distance between two strings.

• For passing the second subtask, you can directly use brute-force with a small optimization of packing 64 bits into a single integer which can be stored in unsigned long long data type in C++. Finding hamming distance between two strings x and y can be done by bitwise operations which are normally executed using gates which takes almost constant CPU cycles.

• You can make use of the fact that if you represent the string using polynomials, problem of finding hamming distance can be converted to multiplying two polynomials which can be done faster using fast fourier transform.

# EXPLANATION:

As N and M are quite small, we can try all possible sub-strings x of A having length M and we can find hamming distance between two string x and B. Finding hamming distance between two strings can be done in O(sum of sizes of strings).

For finding hamming distance between s and t, we can simply iterate over the string s and count the number of positions where s and t have different characters.

Pseudo Code

calculateHammingDistance(s, t):
ans = 0;
for i = 0 to |s|:
if s[i] != t[i]:
ans++;
return ans;
foreach sub-string x of A:
calculateHammingDistance(x, B);


Strings are binary Strings !!!
Let's make use of the fact that strings we are dealing with are binary strings. Can we somehow use bitwise operations to finding hamming distance faster. Assume we have to find hamming distance between s and t. Let us pack s into s / 64 groups of integers. Each group of integer represents 64 bits of s. Now we need to find out a way to evaluate number of bits where two integers differ.

Use bitwise operations
For that we can make use of xor operation. Let us understand xor operation, If the bits are same, then xor is zero, otherwise it is one. So number of differing bits between x and y will be equal to number of set bits (bits having value equal to 1) of x ^ y (here ^ represents bitwise xor operation).

Finding set bits in an integer
For finding set bits of an integer, we can assume __builtin_popcount in c++. Actually there are lot of ways of doing this. You can have a look this wiki link and this blog.

Time Complexity
Se will initially divide the entire string A into A / 64 parts.
Then for each sub-string, we will have M / 64 operations for finding hamming distance. As there are N - M + 1 sub-strings. So Overall time complexity will be O((N - M + 1) * M / 64) .

Represent strings in terms of polynomials
Now we have to make slightly sophisticated use of polynomials. Let us represent our strings as polynomials. Note that we will replace 0 with -1 in the string to get a polynomial of degree equal to size of string - 1.
eg.

• String 010 will be a polynomial (-1) x^2 + 1 x + -1 = - x^2 + x - 1.
• String 100 will be a polynomial (1) x^2 + -1 x + -1 = x^2 - x - 1
• String 0 will be a polynomial -1.
• String 1 will be a polynomial 1.

What does hamming distance represent in terms of polynomials?
Now let us see how to represent the hamming distance between two strings using coefficients of polynomials.

Take string s and t. Reverse the string t. Represent both the strings as polynomials as described above.
Now take the product of both the polynomials, you can observe the absolute value of coefficient of x^(|s| - 1) will be equal to number of equal bits - number of different bits. As we know total number of bits, we can exactly find number of equal or different bits easily by solving two linear equation in two variables.

Let's take an example
Let s = 010, t = 100.
Initially we reverse the string t. So now t becomes 001.
String 010 will be a polynomial p = - x^2 + x - 1.
and String 011 will be a polynomial q = -1x^2 -1 x + 1.
Now see the coefficient of x^2 in product of both polynomials p and q.
it will be (-1 * 1 + 1 * -1 + -1 * -1) = -1. Absolute value of it will be 1.

Note that if both the bits are equal, then product of those is 1 (as 1 * 1 = 1 and -1 * -1 = 1).
If they are different, then the product will be -1. (As 1 * -1 = -1).
So overall the coefficient will be number of equal bits - number of different bits which in our case is 1.

As we know the number of different bits is precisely hamming distance.

We also know that number of equal bits + hamming distance = 3.
and number of equal bits - hamming distance = 1.

So solving these two equations, we get hamming distance = 2 and number of equal bits = 1.

Complete Solution
So we will take strings A and B. Reverse the string B. Represent both the strings as polynomials as described above.
Now instead of iterating over all the sub-strings of A. we will simply multiply the polynomials corresponding to String A and reverse string of B.

For each sub-string, we can get the answer by looking at the corresponding coefficients. Please look into previous example for better understanding.

eg. If A = 0100 and B = 100.
Then we need to take care of coefficients of x^3 and x^2. Coefficient of x^3 can be used to find out hamming distance of 010(A[0:2]) and 100(B) and coefficient of x^2 can be used to find out hamming distance of 100(A[1:3]) and 100(B).

Multiplying Two polynomials
We can multiply two polynomial of deg n faster than n^2. We can use fast fourier transform(FFT) to do this in O(n log n) time.

For understanding FFT, you can use the following links.

Time Complexity:
Usually for FFT, your polynomials should usually be of degrees equal to some power of 2. As FFT works in n log n where n is sum of degrees of polynomials which we are trying to multiply. So Overall, it will O(N log N) where N is smallest power of two greater than or equal to size of string A + B.

# AUTHOR'S, TESTER'S and Editorialist's SOLUTIONS:

This question is marked "community wiki".

2.5k53137170
accept rate: 20%

What's the difference between this problem and http://codeforces.com/contest/472/problem/G this?

(28 Dec '14, 14:19)

@gdisastery1: Solution for that problem is much more complex by using data structures etc. It is not simple FFT as it is the case here.

(28 Dec '14, 14:23)

 5 Is fast fourier transform even there in the IOI syllabus? I don't think it is. What is the point of setting such a question in a competition meant for kids preparing for IOI? answered 28 Dec '14, 14:14 1.3k●15●65●81 accept rate: 4% even lucas theorem or most of the modular arithmetic isn't in the IOI syllabus which was required to solve the second question. (28 Dec '14, 14:20) 1 @pushkarmishra: Sometimes setters are not from IOI background, I also did not knew IOI syllabus :( I have notified admins, they have said that it won't happen again. (28 Dec '14, 14:30) @dpraveen I have raised this point several times before. I even talked to Anup Kalbalia about this in person. I hope this won't happen again. (28 Dec '14, 14:35)
 2 Because of the weak hash function, this problem can be passed using a simple O(n) solution. So it's better to select a stronger hash function next time. answered 28 Dec '14, 15:07 6★xllend3 21 accept rate: 0%
 1 Like I said above, the problem is me and not the statements...Or the pre-requisites for each problem!! I believe nobody is born a genius and that most people can be trained and I'm sure that my reality as someone from a country as Portugal where almost no incentive is given to computer programming competitions, is faaaaar different from your realities, possibly being in constant exposure to IOI problems, training for gold in these competitions, writing lots of problems and mastering basic algorithms from an early stage, etc, etc. I only coded a DFS for the first time 1 year ago (and I'm 24 years old now, how's that for a late start?). Given this, I am not making that bad of a progress imho, I even managed to solve some cool mathematical problems I'm proud of and definitely learnt a lot since I first came here... The pre-requisites which you claim are needed to be known by any high school student (DFS Order, LCA, graphs, heavy-light decomposition, FFT, etc) are only thaught at the later years of a Bachelor's degree here in Portugal or only in Master's degree (meaning only someone really choosing Algorithms as a subject to master will be exposed to these concepts, so, as you see, this reality of having high school kids crushing everyone really "confuses me" and makes me feel dumb/useless). I think that some scaling down should be done, according to my own reality, at least. Instead of having only really, really skilled coders setting problems, I think a platform like Codechef would gain a lot if it had the direct intervention of a group of teachers from several universities (India or outside too) which would scale down the problems to be more suitable for "eager kids wanting to learn more" (or maybe I just don't want to face the fact that 15 year old kids solved harder problems than I'll ever solve in my lifetime as a competitor/enthusiast) or at least would follow any university curriculum more closely, I don't really know or maybe my opinion is biased due to the fact that I'm a "loser" :D What do you think? I mean, how did YOU trained for IOI yourself before there was a Codechef? When did you wrote your first DFS? In the end, it's all up to how trained or smart someone is, but, maybe I would change some things, although I'm totally biased to be discussing this... It's just my two cents :) answered 28 Dec '14, 22:22 3★kuruma 17.7k●72●143●209 accept rate: 8%
 0 hi guys, I solved this problem for all the sub tasks only using pre-processing (but it was after the contest). please see the solution link. I just added the contribution due to ith bit of first string to the final answer in the main loop. Complexity O(n log n). answered 28 Dec '14, 23:53 4★dddlll 31●1●3 accept rate: 0% 1 I also did the same... It works :D .. But this we could do only due to the nature of the hash function. Our solution would fail if the hash function is something more complex. (29 Dec '14, 01:29) win_ay6★ You shouldn't be proud because you found a wrong way to solve a nice problem! (29 Dec '14, 01:35)
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×15,710
×1,343
×334
×71

question asked: 28 Dec '14, 14:03

question was seen: 5,684 times

last updated: 01 Jan '15, 19:46