After the obvious cheating case from DEC’14 (you can read here about it), which however was not handled in any way by Codechef, the users @msm1993 and @kutengine cheated again - this time in a very elaborate manner. At first sight it seems that their solutions are very different - and on the surface they are. But the core idea (and even the data encoded in their source code) is exactly the same.
This is @msm1993’s top submission: http://www.codechef.com/viewsolution/5874709
This is @kutengine’s top submission: http://www.codechef.com/viewsolution/5875762
If one just looks at them they seem very different (and indeed I thought the same thing, too, at first). However, they are in fact almost the same solution at the core - but implemented in two different ways.
@msm1993 has a function called void init_permutation(), which does exactly the same thing as @kutengine’s void init_swap_positions() function from the Permutation namespace. Note that even the number of generated permutations is the same (8000000 in both cases) and they both use the value 6666 when deciding which permutations to store (in @msm1993’s solution it is #define MOD 6666, while in @kutengine’s solution it is const int MDN = 1111 * 6; - as if it were hard to do the math ).
The differences do not stop here. Both @msm1993 and @kutengine have a bunch of data encoded in their solutions, which is data tuned to the official test cases. Although encoded differently, this data is the same.
I will start with the data in @kutengine’s solution. Let’s take the array char pps[6][10005]. The first string in this array starts with “echzdfzdhhzejz”. If we look at the function void init() { from the namespace ZipInt, which decodes this data, we see that ‘z’ is a delimiter and that the other letters are digits (‘a’=0, ‘b’=1, ‘c’=2, …). Thus, this array contains in fact numbers. If we decode the numbers from the prefix I extracted we get: “ech”=427, “df”=35, “dhh”=377, “ej”=49.
@msm1993 encodes the same data in a much more elaborate way, in the arrays int p[][ML], int f[][ML] and int e[][ML]. These arrays are decoded within the class called FastRandomGenerator. By looking at the class we see first that it selects prime numbers in the int gen[ML] array, by using a sieve-based algorithm. It gets gen[0]=2, gen[1]=3, gen[2]=5, gen[3]=7, gen[4]=11, gen[5]=13, gen[6]=17, etc. The gen array is used in the next() function of the FastRandomGenerator class, which decodes the p, f and e arrays. By looking at the code we see that p says the number of elements of f to consider. f contains indices in the gen array (i.e. indices of prime numbers) and e contains the exponent of these prime numbers.
Here’s a prefix of the p, f and e arrays from @msm1993’s solution:
- int p[][ML] = { {3, 1, 1, 2
- int f[][ML] = { {1, 4, 5, 11, 74, 1, 6
- int e[][ML] = { {1, 1, 1, 1, 1, 1, 1
The first number returned by the first call of the next() function considers the first 3 numbers of the array f: gen[1] x gen[4] x gen[5] = 3 x 11 x 13 = 429. Note that the function subtracts 2 from this multiplication, so the first returned number is 427 (the same as in @kutengine’s solution).
The second number returned by next() is simply gen[11]-2. You can check here that gen[11]=37 and, thus, the function returns 35 (the same as @kutengine’s 2nd number).
The 3rd number returned by next() is gen[74]-2. Use the same link to see that gen[74]=379 and, thus, the returned number is 377 (the same as @kutengine’s 3rd number).
The 4th number returned by next() considers the following 2 values from the array f (because the corresponding value from the p array is 2). Thus, it returns gen[1] x gen[6] - 2 = 3x17-2 = 49 (the same as @kutengine’s 4th number).
I will not continue with the other numbers because there are too many of them, but I hope that the pattern is clear.
@kutengine and @msm1993 implemented basically the same core solution which contains the same data tuned on the official test cases. However, in order to deceive anyone who would look at both their source codes, they implemented things differently and encoded the data in different formats, in order to seem that they have unrelated solutions.
For me the annoying part here is that what they did is essentially equivalent to using 2 accounts in order to be able to make twice as many submissions to the challenge problem (and the more submissions one makes, the more data one gets and, thus, one’s solution is tuned much more to the official test cases). Then they used the data obtained by making submissions to the challenge problem from 2 accounts. I can say that if I had been able to make 1000 submissions to the challenge problem (instead of only 500), I could have tuned my solution much more and I could have obtained a significantly better score than the one I did with only 500 submissions (this assuming I would have had the time to make 1000 submissions).
I will also send an email to feedback@codechef.com to report this case, but I am very skeptical that they will do anything. Since they did nothing to punish @msm1993 and @kutengine in DEC’14 long contest, where their source codes were almost identical, I don’t think they will do anything now, when @msm1993 and @kutengine tried very hard to hide the fact that they submitted essentially identical solutions.
I wrote an email about this to feedback@codechef.com and I hope they will take action against @msm1993 and @kutengine, the same way they did for DEC’14.
Edit: I looked at @msm1993’s and @kutengine’s submissions for the other 9 problems of JAN’15 long contest and they seem to follow the same pattern, i.e. they are the same idea implemented in slightly different ways. The most obvious to me are their submissions for the XRQRS problem - the submissions have similar running times and the code is obviously modified in order to seem different, despite being essentially the same.
This is @msm1993’s submission for XRQRS: http://www.codechef.com/viewsolution/5856986
And this is @kutengine’s submission for XRQRS: http://www.codechef.com/viewsolution/5827335
The fast I/O function was renamed and slightly changed and the query functions were slightly modified (in one case they are numbered from 0 to 4 and in the other case they are numbered 1 to 5), but if one pays attention, it is obvious that it’s the exact same solution modified in order to “seem” different.
Edit 2: The Codechef team told me (upon asking them) that @msm1993 and @kutengine replied to them and, upon examining their case better, they decided that this was not cheating. Thus, no action was taken against them. Although I believe the conclusion to be wrong, there is nothing more that I can do.