Separate final test cases for Challenge Question.

vineetpaliwal · August 14, 2013, 1:20pm

@admin , @everyone : I believe final test cases for Challenge Question should be different from those used during the contest . The practice of making hundred’s and thousand’s of submission’s to understand test cases at server is becoming very prevalent and is not in the interest of “MAY THE BEST SUBMISSION WIN” .

I sincerely hope Code Chef admin’s will look into this .

Once the question is moved to practice section you can keep both set of test cases or discard any one , as it doesn’t matter if you do over-fitting once the contest is ended .

Things like money , rating , bragging rights are at stake

I think it is not in interest of CodeChef anyway if people are making thousand’s of submission to learn test data since each submission gets run on SPOJ which costs money to Code Chef .

I don’t think contestants who resorted to such strategy can be blamed because currently Code Chef admin’s have neither raised this issue , nor took preventive action .

mugurelionut · August 14, 2013, 3:47pm

I also don’t like this practice of making lots of submissions in order to tune some solution parameters so that, in the end, the solution works better on the official test data (although I did use this approach in the past contests, including the most recent one). On the other hand, I also don’t particularly like the idea of having different tests after the contest (like in TC Marathon matches) - there’s something really appealing in having the certainty of knowing your score during the contest (I am talking about the absolute value of the score, not the relative score which is constatly changing).

However, I believe that using interactive challenge problems with the test cases generated at each submission, like those from the March’13 and May’13 long contests, could partially solve the problem, but not completely.

Anyway, I should say that the challenge problems from the past two contests (July and August) kind of encouraged people to make many submissions because they did not explain the test generation process. When a contestant cannot reproduce the type of test cases used during the evaluation on his own computer there’s no other way to test if his/her ideas are good or not other than submitting it. I would like to encourage future challenge problem setters to use test cases whose generation process can be properly described.

vineetpaliwal · August 14, 2013, 5:20pm

@mugurelionut , @djdolls , @admin , @everyone : On the contraray , I believe the test case generation strategy ( whether OPEN or HIDDEN ) should be same for in contest test cases and final test cases . Forcing the setter to explaining test case generation may be okay , but still what’s the problem if we use a different set of data generated using the same test case generation strategy for FINAL RESULTS. Because the problem I am highlighting is independent of whether test case generation strategy is OPEN or HIDDEN . @djdolls : You will still see people trying to fit their submission to test data if the final test cases are not different .

brianfry713 · August 15, 2013, 12:56am

It might be more difficult to hack the input file if it was permuted randomly, so if there are 14 test cases they’d appear in a random order on each submission. That would be fair as everyone would still get the same test cases.

ACRush · August 28, 2013, 11:50am

@mugurelionut , @djdolls , @admin , @everyone :

I generally agree with the idea of separating final test cases for Challenge problem.

(+1) I agree with the point that, thousand’s of submission to learn test data is super boring.

(+10) Prefer to challenge problems whose generation process can be properly opened.

(+100) I totally agree that separated final test cases should have the same distribution of provisional test cases, though the contests needn’t to be the similar format as TC Marathon matches.

(-1) Interactive challenge problems may partially solve the problem, but it may also increase the number of submissions.

(-1) We may need much longer time to test it out.

(-10) One potential problem : the (final) submission got “Wrong Answer” or “TLE” on some final test cases?

Some ideas :

Test the submission on all cases, but only show scores for the first 10%. And use the later 90% to determine the winner.
Use relative scores for each testcase.
Capped penalties for “WA” and “TLE” cases.

vineetpaliwal · August 28, 2013, 6:49pm

@ACRush : Thanks a ton for your support on the issue and sharing your opinions on an important matter

vineetpaliwal · September 3, 2013, 1:28pm

@admin , @djdolls , @ACRush , @mugurelionut , @brianfry713 , @betlista :

How about having a CAPTCHA ( Completely Automated Public Turing test to tell Computers and Humans Apart ) before every challenge problem submission . That would prevent people from using scripts and force them to do whatever experimentation they need to do manually , which they can do only in a limited way in a 10 day contest .

This should solve the problem considerably as last time max number submission of submissions on the challenge problem by a given user was around 5000

Looking forward to a “FAIR” September LONG contest , where i don’t see more than 500 submissions for challenge problem by anyone . Well, that’s my idea of “FAIR” .

@everyone : Your opinions are welcome .

vineetpaliwal · September 5, 2013, 8:18pm

10 days/contest * 24 hours/day * 60 minutes/hour = 14400 minutes/contest

A 6 minute gap between submissions would mean 14400 / 6 = 2400 maximum submissions

And if people are not using scripts , that mean they make submissions only during day and the time when they are sitting on system ( say about half the day , 12 hours/day ) would mean max 1200 submissions .
Dont’ know if this enough to solve the issue at hand . But could be a good step nevertheless .

@admin : Do people really use command line browser to access CodeChef , is there any evidence to it ???

Not being able to use some utility to make submission just for 1 out of 10 problems should not be a matter of concern .

Similary , frustration should be limited as we ask for CAPTCHA only for CHALLENGE problem and not otherwise .

All my suggestions on this thread relate only to CHALLENGE problem and should apply only to it .

utkarsh_lath · September 6, 2013, 9:31pm

In my opinion, the following are a must in the interest of fairness:
a) Test data generation should be made public. This is because my final solution strongly depends on what generation scheme is used.

Theoretically, I should be able to decide which of my schemes are better,
For inputs with multiple parameters, often strategies strongly depend of relative distribution of parameters and I must know them before hand.
Test data can be designed in adversarial fashion for some “good” schemes and If I am not aware of a it, my “good” scheme could actually end up doing worse than a “bad” scheme.
People can spend time more usefully in cooking up solutions rather than figuring out the test cases. Nobody likes to do it, but people are left with little choice.

b) Final test data should be different from the one used during contests.

People wont make 1000s of submissions trying to align their strategies with the judge’s test data.
People can rate their solutions offline and be assured that it is a good enough estimate of the actual score they are going to get.
The better strategy will win with more probability as no test data specific hacks will work.

If a) and b) are enforced then number of solutions will go down automatically, without need for captcha and all.

We can allow people to mark some 5-10 submissions and each can be run for the final test data(say last 10 submissions). This is because they could have used different schemes that have similar results, then they may want all of them to be used for final testing. Making this number small enough will ensure that people only put solutions with different ideas/schemes, at the same time allowing room / incentive for more creativity.

karan173 · August 14, 2013, 1:44pm

Also, I think there should be an upper limit on the no. of submissions that can be made for a challenge problem.

djdolls · August 14, 2013, 5:08pm

I agree with mugurelionut completely. The problem can be avoided by explaining the test case generation process like in most previous long contests.

betlista · August 14, 2013, 5:34pm

Just a small question. If there are two test case data sets. You want that for final score all my submissions are executed or not?

Typically reason for multiple submissions is that there is randomization used, so coders are trying just to have a better luck…

vineetpaliwal · August 14, 2013, 5:37pm

@betlista : I would want the last submission made during the contest to be used for final scoring .

samjay · August 14, 2013, 10:56pm

Those were very good challenge problems as well. And I also wonder what if it would suddenly TLE on the new testdata (or maybe suddenly be wrong). Than I would score zero points? Don’t like that idea much, this problem for example I had AC and TLE for the same code, so changing the data may put it even more at risk.

betlista · August 15, 2013, 1:14am

great idea: simple and fair

and it can be used for regular problems as well and tricks to find which input your program is failing on are useless (but it will be very bad if there is wrong format in input file)

vineetpaliwal · August 15, 2013, 1:20am

@samjay : Suppose there are 10 final test case and 5 contest time test cases , then when you submit during contest it is run on all 15 test cases but you are shown score of only 5 test cases while at the end of contest it will change to score of other 10 test cases . You will not a correct answer verdict during contest if your code gave “Wrong Answer” or “TLE” or “RE” or some other problem occured .

admin · September 4, 2013, 3:46am

This may break things for those who use a command line browser or have built a command line utility to make submissions. It might also be frustrating for users to enter a captcha each time they want to make a submission. We are open to suggestions on this.

mugurelionut · September 4, 2013, 1:58pm

hm… I made all my submissions manually (for all the challenge problems so far) and I still ended up with a bit more than 2300 submissions for the August’13 challenge problem (and I did not spend all my time on the challenge problem, as I also have other things to do in my day-to-day life ). With more perseverence and dedication I guess it is possible to reach even around 5000 submissions manually. Nevertheless, a captcha would definitely slow things down a bit (as well as a higher minimum duration between submissions, which is currently 30 seconds).

betlista · September 5, 2013, 2:01pm

2300 manual submissions? What the hell? I have to learn a lot

eagle_eye · September 5, 2013, 9:38pm

6 minutes is too long to take break… it would be very boring to tackle …
@vineetpaliwal no one has as much patience as you have …I think this idea wont work
what can be done is that make strong testcases with largelimits so that they cant be recognised by submissions (assert ) + one can have difern final testcase for final result (which I completely support)