Weak test cases highlighted again..

Once again, the problem of test cases being extremely weak was highlighted in the FEBRUARY long challenge. The problem “Call centre Scheduling” was probably a good one but it contained weak test cases. Even my greedy solution which I think could fail on some corner cases passed easily. You can have a look at My solution .

The more weird think was that this solution also passed the test cases which actually doesn’t implement any algorithm, just prints out “yes” and “no” randomly. The interesting thing is that the program uses “srand((unsigned)time(NULL));” which actually generates random numbers according to the system time, meaning the same solution may fail sometimes but gets AC other times.

Really it was annoying to see such solutions getting AC in such renowned contests. I hope Codechef looks into much matters carefully. I rejudge of the problem with Strong test case data would be the best solution. I would not mind even my solution getting WA after that but it would do justice to others as well.

Looking forward for some quick action.

4 Likes

The solution you mention is by far not random (except for one tiny case where it uses rand() % 2). It identified many conditions to “isolate” each test file. And since the answer for each test file is equivalent to a number between 0 and 31 (i.e. 0 … 2^5 - 1, since there are only 5 test cases per file), the user probably made multiple submissions to figure out the answer for each test case and this was just the last submission.

To be more precise, imagine that you make 32 submissions and in each one you print the same answer, from 0 to 31. Then, for each test file, you would know the exact answers (since the system shows AC/WA for each test file). Now, you need some additional conditions to say something like: if test_file = 1 then print answer1; if test_file = 2 then print answer2, etc.

The submission you pointed to does this by considering the values of the parameters: h, d, p, and, in addition, sum and x. It seems, though, that there was a case when the user couldn’t tell the difference between 2 test cases - that’s when it used rand() % 2. Trying this several times (even just two times, maybe :slight_smile: ) would definitely result in the proper “selection” of test cases.

So I’d say the test cases were not weak. The problem, as it was designed, could have been “hacked”: by anyone (i.e. anyone could score 100 points without really solving it). So the whole problem was flawed. In general, every problem whose answer per test file is or can be encoded as a “small” number can be hacked by this method.

11 Likes

The question Sereja and Two Lines also had weak Test cases.

All I did was, for all (i,j) :

x = find the maximum in row i, and add it with the number of times it appeared in column j

y = find the maximum in column i, and add it with the number of times times it appeared in row j

ans = max(ans,x,y);

Clearly, this should not work. But it did for the bigger sub-tasks, and failed for the smallest one.

Thankfully, that could be done by O(n * m * min(n,m)) to get full 100 points! :slight_smile:

See this for solution : CodeChef: Practical coding for everyone

@mugurelinout: Earlier we were thinking of having just 1 test case, but we changed it to having 5 test cases precisely due to this reason. We would be more careful in the future regarding “YES”/“NO” type questions. Thanks a lot for your detailed explanation.

1 Like

@dpraveen: This problem should have had 15+ test cases per file in order to make sure it needs to be solved properly. By the way, I made the same point last year, regarding problem SEZ from the MAY 2015 long contest: see my answer from SEZ - Editorial - editorial - CodeChef Discuss . I think no one took advantage of this at the time, but people got smarter now :slight_smile: [ or they read my post from that problem :stuck_out_tongue: ]

2 Likes

@mugurelionut How can you check that test_file == 1 or 2?

@mugurelionut, can you explain whether my greedy approach was correct or not?