FAULT - Editorial



Author: David Stolp
Tester: Hiroto Sekido
Editorialist: Anton Lunyov




Bitwise operations


The best described in the problem statement.


The simplest solution is to output all servers. It will bring you only 13.249427 points, but you can even don’t read distribution of chunks by servers :slight_smile:

The second simplest solution that still allows you to not read distribution of chunks by servers is to output first S − N + 1 servers. Thus, we left only N − 1 servers. It is clear that it is impossible to recover all N chunks having only N − 1 their combinations. Such solution brings you exactly 10 points and actually scoring function provides a hint to such solution :slight_smile:

The third simplest solution is to iterate over all chunks, find the one that belongs to the least number of servers and output servers of this chunk. This will bring a better score 6.378516. This works because after deleting these servers, the chosen chunk is not contained at any other server, so we can’t recover it.

Now we consider the idea that allows to reach the high score.
The input can be represented as S × N system of linear equations over the field Z2. The variables are chunks and the right hand side is composed of servers. If A is the matrix of this system, then A[i][j] = 1 if and only if chunk j is contained on server i (1 ≤ i ≤ S, 1 ≤ j ≤ N).

Claim. The chunks could be always uniquely recovered if and only if all N columns of this matrix are linearly independent over Z2.
Indeed, if columns are linearly independent, then using facts from basic linear algebra we can left only N rows of this matrix such that obtained N × N matrix is non-degenerate. Hence we can recover chunks by servers using inverse matrix. Now assume that columns are linearly dependent and their rank is K < N. Then we can left exactly K linearly independent rows in the system. It is well-known that such system has not-unique solution for every possible vector in right-hand-side.

Thus, we need to delete as few rows as possible such that some column becomes a linear combination of other columns modulo 2. This, in turn, equivalent to performing series of elementary column operations such that some column becomes zero (in the matrix with deleted rows). Elementary column operation is simply a bitwise xor of columns if we consider them as binary numbers.

Instead of deleting some rows and then trying to reach zero column, we can do at first some elementary column operations and then for every column the set of rows we should delete to make this column zero is simply all rows having 1 in this column.

To achieve better performance we could represent columns as collection of 32bit unsigned variables and do bitwise xor 32 times faster (see bitset data structure in author’s solution).

Now the simplest we can do is to perform random bitwise xors while time permits, check whether the column we change have less ones than the answer and update the answer if necessary. To count the number of ones at GNU C++ we can use fast built-in function __builtin_popcount. Otherwise we can precalculate popcount for numbers up to 216 and calculate popcount of 32bit number as a sum of popcount for its lower 16 bits and popcount for its higher 16 bits. Such solution could bring about 4.89 points.

Another method to use this idea is the following.
At each step we select random 1 in the matrix. Let it be in the intersection of the i-th row and the j-th column. Then we do the step like in Gaussian elimination. For each column k other than j, if the i-th element in the k-th column is 1 we replace k-th column by its bitwise xor with the j-th column. Also we check whether the updated k-th column has less ones than the answer and update the answer if necessary. We do these steps while time permits. Such solution could bring you about 3.75 points (note that the winner score is about 3.60).


Author’s solution can be found here but it get WA :slight_smile:
The corrected version is here (the score of this submissions is 3.745888).

Tester’s solution can be found here.




The top solution by the facebook guy winger made my head spin with angular velocity as an irregular function of the number of servers and chunks present in the problem in the given domain. :smiley:

Author’s solution to this problem is giving WA


I performed something like last step along with bitwise but never get better than 3.941693.
can someone please tell me what can i do further in order to get ~3.75 ?

My approach was the one marked as “the third simplest solution” in the editorial. My final score (out of 1) is 0.511. Interestingly, I am the only one with that score!! :stuck_out_tongue:

Surely, everyone must have moved on!!

The test cases generated for the problem were very fair and just(better algos got better points) . I would like to see the algorithm with which these test cases were generated just for learning and out of curiosity.

Can somebody elaborate on the 4th solution given in the editorial which contains the use of Gaussian elimination and the use of elementary transformations.

I didn’t get the gaussian elimination method. Here servers and chunks are denoted along rows and columns respectively, then to convert 1 into 0 they Xor two columns. But isn’t it conceptually wrong to Xor 2 different chunks? Remember that column operations are invalid in Gauss Elimination for solving linear equation.

Here they have assumed servers to be the linear combinations of chunks, which may have 1 or 0 as coefficient, But isn’t it chunks which are linear combinations of servers?

@rahul_nexus @mangatmodi
I’ve updated the editorial.
I hope it becomes more clear now.

Thanks It is much clear now. I was confused because of column operation, as it is illogical to xor two different chunks. However I got it now. I just remembered that column rank = row rank. So, basically we are avoiding row operations as it wouldn’t give ans, but decreasing column rank by column operations followed by some rows deletion.

To me the best thing to take away from @winger’s solution is how he split the possible test cases according to a hash computed over the input values. Then, during the contest, he tried to find the best rand seed for these hash values (i.e the rand seed for which his algorithm produced the best result). This is obviously better than using the same rand seed for all the test cases (that’s what I did) or choosing some random seed (e.g. a time-dependent seed), in which case you have no control over your solution’s performance.


But this approach is time consuming.

@argos >> doesnt matter if it yields you a better score.

@mugurelionut >> thanks for the rand seed explanation. I was not getting that.

Here you go:


thnks … :slight_smile:

@argos: It’s only time consuming in the sense that you have to make many submissions in order to get the rand seed right. However, since there are only 40 test cases (and let’s say that each of them will get a different hash value) and you make, say, 10 submissions for each test cases (with 10 different rand seeds from which you pick the best one), you will get a good score in about 400 submissions. Of course, you will have to make some extra submissions in order to find the hash values of the test cases (e.g. even if your hash takes values from 0 to 999, only at most 40 of them are used).

Thanks for pointing this out. It is fixed.