@mugurelionut , @djdolls , @admin , @everyone : On the contraray , I believe the test case generation strategy ( whether OPEN or HIDDEN ) should be same for in contest test cases and final test cases . Forcing the setter to explaining test case generation may be okay , but still what’s the problem if we use a different set of data generated using the same test case generation strategy for FINAL RESULTS. Because the problem I am highlighting is independent of whether test case generation strategy is OPEN or HIDDEN . @djdolls : You will still see people trying to fit their submission to test data if the final test cases are not different .
It might be more difficult to hack the input file if it was permuted randomly, so if there are 14 test cases they’d appear in a random order on each submission. That would be fair as everyone would still get the same test cases.
I generally agree with the idea of separating final test cases for Challenge problem.
(+1) I agree with the point that, thousand’s of submission to learn test data is super boring.
(+10) Prefer to challenge problems whose generation process can be properly opened.
(+100) I totally agree that separated final test cases should have the same distribution of provisional test cases, though the contests needn’t to be the similar format as TC Marathon matches.
(-1) Interactive challenge problems may partially solve the problem, but it may also increase the number of submissions.
(-1) We may need much longer time to test it out.
(-10) One potential problem : the (final) submission got “Wrong Answer” or “TLE” on some final test cases?
Some ideas :
- Test the submission on all cases, but only show scores for the first 10%. And use the later 90% to determine the winner.
- Use relative scores for each testcase.
- Capped penalties for “WA” and “TLE” cases.
@ACRush : Thanks a ton for your support on the issue and sharing your opinions on an important matter
@admin , @djdolls , @ACRush , @mugurelionut , @brianfry713 , @betlista :
How about having a CAPTCHA ( Completely Automated Public Turing test to tell Computers and Humans Apart ) before every challenge problem submission . That would prevent people from using scripts and force them to do whatever experimentation they need to do manually , which they can do only in a limited way in a 10 day contest .
This should solve the problem considerably as last time max number submission of submissions on the challenge problem by a given user was around 5000
Looking forward to a “FAIR” September LONG contest , where i don’t see more than 500 submissions for challenge problem by anyone . Well, that’s my idea of “FAIR” .
@everyone : Your opinions are welcome .
10 days/contest * 24 hours/day * 60 minutes/hour = 14400 minutes/contest
A 6 minute gap between submissions would mean 14400 / 6 = 2400 maximum submissions
And if people are not using scripts , that mean they make submissions only during day and the time when they are sitting on system ( say about half the day , 12 hours/day ) would mean max 1200 submissions .
Dont’ know if this enough to solve the issue at hand . But could be a good step nevertheless .
@admin : Do people really use command line browser to access CodeChef , is there any evidence to it ???
Not being able to use some utility to make submission just for 1 out of 10 problems should not be a matter of concern .
Similary , frustration should be limited as we ask for CAPTCHA only for CHALLENGE problem and not otherwise .
All my suggestions on this thread relate only to CHALLENGE problem and should apply only to it .
In my opinion, the following are a must in the interest of fairness:
a) Test data generation should be made public. This is because my final solution strongly depends on what generation scheme is used.
- Theoretically, I should be able to decide which of my schemes are better,
- For inputs with multiple parameters, often strategies strongly depend of relative distribution of parameters and I must know them before hand.
- Test data can be designed in adversarial fashion for some “good” schemes and If I am not aware of a it, my “good” scheme could actually end up doing worse than a “bad” scheme.
- People can spend time more usefully in cooking up solutions rather than figuring out the test cases. Nobody likes to do it, but people are left with little choice.
b) Final test data should be different from the one used during contests.
- People wont make 1000s of submissions trying to align their strategies with the judge’s test data.
- People can rate their solutions offline and be assured that it is a good enough estimate of the actual score they are going to get.
- The better strategy will win with more probability as no test data specific hacks will work.
If a) and b) are enforced then number of solutions will go down automatically, without need for captcha and all.
We can allow people to mark some 5-10 submissions and each can be run for the final test data(say last 10 submissions). This is because they could have used different schemes that have similar results, then they may want all of them to be used for final testing. Making this number small enough will ensure that people only put solutions with different ideas/schemes, at the same time allowing room / incentive for more creativity.
Also, I think there should be an upper limit on the no. of submissions that can be made for a challenge problem.
I agree with mugurelionut completely. The problem can be avoided by explaining the test case generation process like in most previous long contests.
Just a small question. If there are two test case data sets. You want that for final score all my submissions are executed or not?
Typically reason for multiple submissions is that there is randomization used, so coders are trying just to have a better luck…
@betlista : I would want the last submission made during the contest to be used for final scoring .
Those were very good challenge problems as well. And I also wonder what if it would suddenly TLE on the new testdata (or maybe suddenly be wrong). Than I would score zero points? Don’t like that idea much, this problem for example I had AC and TLE for the same code, so changing the data may put it even more at risk.
great idea: simple and fair
and it can be used for regular problems as well and tricks to find which input your program is failing on are useless (but it will be very bad if there is wrong format in input file)
@samjay : Suppose there are 10 final test case and 5 contest time test cases , then when you submit during contest it is run on all 15 test cases but you are shown score of only 5 test cases while at the end of contest it will change to score of other 10 test cases . You will not a correct answer verdict during contest if your code gave “Wrong Answer” or “TLE” or “RE” or some other problem occured .
This may break things for those who use a command line browser or have built a command line utility to make submissions. It might also be frustrating for users to enter a captcha each time they want to make a submission. We are open to suggestions on this.
hm… I made all my submissions manually (for all the challenge problems so far) and I still ended up with a bit more than 2300 submissions for the August’13 challenge problem (and I did not spend all my time on the challenge problem, as I also have other things to do in my day-to-day life ). With more perseverence and dedication I guess it is possible to reach even around 5000 submissions manually. Nevertheless, a captcha would definitely slow things down a bit (as well as a higher minimum duration between submissions, which is currently 30 seconds).
2300 manual submissions? What the hell? I have to learn a lot
6 minutes is too long to take break… it would be very boring to tackle …
@vineetpaliwal no one has as much patience as you have …I think this idea wont work
what can be done is that make strong testcases with largelimits so that they cant be recognised by submissions (assert ) + one can have difern final testcase for final result (which I completely support)
@eagle_eye : Thanks for the feedback . I too feel such things as time limit between submissions , use of CAPTCHA are all just unnecessary workarounds . I still vouch for separate in contest and end contest test files for CHALLENGE problem .
Hi Vineet, we have made some changes to the scoring of challenge problem starting from October Challenge 2013. You can read them all here: http://blog.codechef.com/2013/10/03/challenge-problem-scoring-changes/