Given an array of strings find the number of pairs of strings such that their concatenation contains first 20 characters of English alphabet.
Represent each string as a bitmask , where i th bit is set if the ith character is present in the string. The problem reduces to finding the number of pairs in the array such that their bitwise OR is pow(2,20)-1. This can be solved by a dynamic programming where dp* represents the number of supermasks of i. There are numerous ways to create the transitions for this dynamic programming.
Notice that the only piece of information we require from a string is to find for each character if it is present in the string or not. Therefore one can imagine representing the string as a binary array where index ‘i’ in the array tells if the ith character is present in the string or not. A more compact and efficient representation of the same can be achieved by storing it as an integer bitmask,where ith bit corresponds to ith index in the binary array. So what we get is an array of integers corresponding to the strings in the input.
The operation of concatenating two strings just reduces to applying a bitwise OR operation on the bitmask representation of the corresponding strings.
Notice that a string will be super string if all the values in its bitmask representation are 1. That is its integer value is exactly pow(2,20)-1.Lets call this value X.
So now the problem is reduced to find the number of pairs in the array whose bitwise OR is equal to X.
Note that the brute force approach two iterate over all pairs in O(n^2) , so we need something faster.
If for each element we know the count of the number of elements with which this element has a bitwise OR of X, then we are done. But when will an element(e) have a bitwise OR of X with another element ? When the other element is a supermask of (X-e) .
Here by supermask(a) , I mean a bitmask which contains all the 1 bits of a and the 0 bits of a are either 0 or 1 in the bitmask.
Calculating the number of supermasks in the array:
There are a variety of approaches to calculate this quantity. I will explain the author’s approach below:
Let us say that we have calculated the quantity for all the numbers from X …. i+1. How do we calculate the quantity for i?
Let us represent the ith bitmask by x0 x1 x2 …xk3…xk2…xk… xm.
Let bit(i,k) denote kth bit of i from LSB.
We know that an element is a supermask of itself, so we add it to dp*. Now going from LSB to MSB in this mask when we encounter the first 0(let it be the k th bit) , we know that i|bit(k) is a supermask of i. Also clearly all the supermasks of are also supermasks of i . Therefore we can say : dp*+=dp[i|bit(i,k)]. But can we do this for the next 0 index in i(say k2)?
No ! Think why ? (Hint: Are you counting some masks twice ?)
So how do we solve this problem? Suppose we know the number of supermasks of i|(bit(i,k2))such that all the bits to the right of k2 in these bitmasks are the same as that of i . Now we can simply add this quantity to the answer for all the 0 bits in i. How do we calculate this quantity ?
Again dynamic programming!
Let dp2*[j] denote the number of supermasks of i such that in these supermasks the rightmost ‘j’ 0-bits of i are all 0.
Then the recurrences will be:
dp2*[j]=dp*- dp[i|(bit(i,k)) , if j is 1 ( number of supermasks of i in which first 0-bit of i is 0 = Total number of supermasks of i - number of supermasks of i in which the fist 0-bit is set to 1). otherwise: dp2*[j]=dp2*[j-1]-dp2[i|bit(i,k)][j-1] (Similar as above)
So finally dp*=dp[i|bit(i,k)]+dp2[i|bit(i,k1)]+dp2[i|(bit(i,k2)] ….
See author’s solution for details.
There exists a beautiful divide and conquer solution for calculating the dp array defined above. Define f(x,y) as the solution when we update dp counts only from masks in the range x to y. Then by calculating f(x,(x+y)/2) and f((x+y)/2+1,y), f(x,y) can be calculated easily. Can you think how ?
See this solution for details.I strongly recommend seeing other submissions for the problem as well for different approaches on calculating the dp.
AUTHOR’S AND TESTER’S SOLUTIONS:
Author’s solution can be found here