PROBLEM LINK:Practice Setter: Bogdan Ciobanu DIFFICULTY:MediumHard PREREQUISITES:Linear Independence,Rank of a matrix, Gaussian Elimination, MonteCarlo algorithm and a bit of Randomization. Probabilistic analysis for proof only. PROBLEM:Given a set of Recipes of size $N$ with some recipes being the base recipes while others being the combination of some bases/nonbase recipes. A base decomposition of a recipe is the composition in terms of base recipes present in a recipe. In case two recipes having any same base recipe are merged, the recipe same in both, disappears from the merged recipe. Answer $Q$ queries, each giving $K$ dishes, we need to answer, whether we can select two sets of recipes such that their base decomposition is the same. SUPER QUICK EXPLANATION
EXPLANATIONLet us discuss a solution intuitive from problem statement. Suppose we have $X$ base recipes. If all base recipes are assigned values $2^0, 2^1, \dots 2^{X1}$, we can represent all recipes as binary numbers having exactly $X$ digits, ith bit specifying whether the ith base recipe is in the base decomposition of the current dish. Using this, our queries become, given a set of Binary numbers, all having $X$ digits, Find out whether we can select two sets of values such that if we take the xorsum of all values in both sets, we get the same binary number. One thing we can notice is, that if we merge these two numbers by taking its xor, we obtain 0 as the xorsum. So, now query become, to determine whether we can select a nonempty subset from a set of binary numbers having $X$ digits the xorsum of select elements is 0. Assuming we consider each binary value as a vector having $X$ values, we get a set of $K$ vectors each of size $X$ out of which we need to select a set of vectors such that xorsum of all selected vectors is 0 for all corresponding values. This seems much more familiar than the original statement if we are aware of the concept of Linear Independence of a vector. A vector is said to be linearly dependent on a set of vectors if this vector can be written as a linear combination of other vectors. Hence, a set of vectors is said to be linearly independent, if we cannot write any vector as a linear combination of other vectors. Now, For our problem, we have to answer query whether we can select a nonempty set of vectors such that their xorsum is zero. Suppose we take out one vector from this set. It is not hard to see, that the xorsum of all other vectors in the set is same as the vector removed. This implies, that we are required to check if the given set of Linearly dependent or not. To check this, we can resort to representing this set of vectors in a matrix and use the Row Echelon form of the matrix to find Rank of a matrix, which, is the maximum number of linearly independent vectors in a given set of vectors. If we have the rank of this matrix as $K$, No vector is linearly dependent on other, implying we cannot select a set of vectors which has xorsum zero in all dimensions, Hence, the answer is no in this case. If Rank of this matrix is less than $K$, we can select at least one vector which is the linear combination of a set of vectors, hence, the answer is yes in this case. Now, To find the Rank of a matrix, we use Gaussian Elimination. Procedure for finding the rank of a matrix is simple using row reduction and thus, is a part of the exercise. But, We cannot even store $N$ binary numbers having $X$ digits each since both $X$ and $N$ can range up to $10^5$. Also, Gaussian Elimination would be $O(X*K*min(X, K))$ per query. We need something faster. We can use randomization here. Suppose, we assign each base recipe any random number out of $F_{2^{64}}$ and work the same way. This way, when we can solve this problem faster. But, Using just a random value and 64 bits in place of $X$ bits has some consequences. The consequence is, that though we will report the set of vectors to be dependent when they are actually dependent, we may also end up reporting some linearly independent set of vectors as being linearly dependent. This happens, because of random values assigned to base recipes. I'll give an example of such a case. Three Base recipes numbered 1, 2 and 3 assigned random values 5, 17, and 20. In the query, we are given all three recipes. We can see, that $5^17 = 20$ which may lead to recipe 3 being reported as the linear combination of recipe 1 and 2, while it is not true. Hence, If we run this process a number of times with different random values assigned to base recipes in each run, the probability of still reporting such false positive is negligible. Actually how much, we can see below. Probabilistic Analysis Using the abovementioned randomized solution, we get a $K*64$ matrix having each value either 0 or 1. The probability of a matrix being singular is $\frac{2^{64}1}{2^{64}}$ when $K = 1$. When $K = 2$, The matrix can be singular when either all elements of the row are same as the previous row, or are zero, leading to the probability of no singular as $\frac{2^{64}2}{2^{64*2}}$. For Higher values of $K$, we can see, that the matrix can be transformed into a singular matrix when values in the current row can be represented as the xorsum of a subset of previous $K1$ rows. This happens in $2^{K1}$ cases. Hence, the probability of having $K$th row as a combination of xorsum of a subset of rows, is $\frac{2^{K1}}{2^{64*K}}$, leading to non singular cases as $\frac{2^{64}  2^{K1}}{2^{64*K}}$. We can see, that probability of having matrix singular is the product of probabilities of having the ith row as xor combination of previous rows, leading to the probability of singular matrix as $\frac{\prod{12^{64i}}}{2^{64*K}}$ which is very small for $K$ up to 30. Even a single run of algorithm suffices. Note on Randomization The Randomization we used above, comes under False based Monte Carlo algorithms, which always give the return false when the correct answer is false, but may also give true verdict even when the verdict is actually false. More can be read about Monte Carlo algorithms and probability analysis at Wikipedia page, here. Randomization is really useful and can be effectively employed in problems if you can keep the probability of error below a certain threshold, as required. Try out this Nice Problem Factorize which uses a bit of randomization along with Number Theory, and has a detailed long Editorial. Gaussian Elimination in $O(K^2)$ (specific to this problem only) For this particular problem, we have $K*64$ matrix represented by $K$ values of 64 bits. In Gaussian elimination, when we reduce the contribution of a vector from the remaining rows below the current row, we subtract current row multiplied by a factor to eliminate current dimension by iterating over the whole row. But here, we can use xor in place of subtraction, because $1 = 1(\bmod 2)$. Also, rows have either value 0 or 1, we also do not need to multiply the row by any constant value. This way, we can just xor the value of the current row with all subsequent rows which have current column bit standing. Details can be found in solutions. Time ComplexityTime complexity is $O(N+\sum{p}+Q*K^2*C)$ where $C$ is the number of times we run above algorithm. AUTHOR'S AND TESTER'S SOLUTIONS:Setter's solution Feel free to Share your approach, If it differs. Suggestions are always welcomed. :)
This question is marked "community wiki".
asked 23 Nov '18, 11:45
