Invitation to CodeChef April Long Challenge 2018!

alexthelemon · April 17, 2018, 11:34pm

aryanc403: I appreciate that this is allowed strategy and is not trivial to carry out, but I don’t think it is a good one to encourage because it deflects from the primary purpose of making a good algorithm. I know (as you suggest) that I could do the same thing myself, but I am not going to do this: if that is the only way to win then I’d prefer to compete elsewhere instead. I don’t think I’m the only one to think this, because vijju said that the maximum number of submissions has been reduced to try to prevent this. (I have a suggestion as how to modify the rules in a separate post below.)

mgch · April 17, 2018, 11:49pm

@alexthelemon I strongly agree with you that reverse-engineering isn’t fair. Unfortunately, it hasn’t prohibited till now, hence it’s allowed. Probably, it was the main reason why some contestants were too good at challenges for years. We’ll discuss it and I hope we find some solution(hidden time/memory and few submissions sounds good). Thanks for your feedback!! Also, congrats on winning Div2!! Good luck in Div1

vijju123 · April 18, 2018, 12:37am

In my opinion it is better if the competition is to find the program that works best on a random problem instance

Oh! Are you suggesting that the TC at which program runs should be dynamically generated rather than being a fixed case?

We fear that some contestants may get unlucky (or too lucky- both are bad ). Like, once in the last problem of long (Something on squarefree numbers ) the TC were dynamically generated. My solution which TLE on cases, got accepted on 5th try. We will need to find a way to minimize- or even prevent these instances from happening if we are to implement it

vijju123 · April 18, 2018, 12:40am

I think we can implement hiding the time and memory taken for challenge problem- merely telling if its AC’d or not. That can help a lot.

I think we can do away with telling the “Score” of problem- merely telling how many points it fetched you out of 100 seems good.

The suggestion to “submit upto 200 solutions, out of which at most 20 (which ran on hidden TC) will be considered for leaderboard” seems nice. 20 submissions limits reverse engineering by a lot.

Already pinged @admin to collect feedback by tomorrow or day after, so feel free to suggest

vijju123 · April 18, 2018, 12:57am

Yes, they require approval to be public, else people paste all sorts of code and ideone links. I once decided tog et them all disqualified- but later felt it would be too harsh to those who are new. Perhaps they didnt bother to read rules.

Yeah, I tried to answer as many comments as I can xD. For updated versions also, good thanks to @mgch , Misha is one of the best people out there

alexthelemon · April 18, 2018, 1:36am

No I wasn’t suggesting that the test case should be different for different people. That would make it far too random. You definitely need the same test cases for everyone, but no information about them should leak out. That way the problem from the programmer’s point of view is to get the best result on a random instance (because he or she knows nothing about the test case, so it is effectively a random instance from their point of view).

alexthelemon · April 18, 2018, 1:41am

I didn’t mean to suggest that “at most 20 will be considered for the leaderboard”. Sorry if I wasn’t clear.

I was suggesting that when you submit a challenge problem solution you should have an extra option called “receive return code from hidden instances”. You are allowed to select this option at most (say) 20 times during the competition. When you select this option, if you get an AC it means you can be sure that your program worked for the hidden TCs.

The reason for restricting return codes like this is that the mechanism for information leaking back to the user is via the ret code…

vijju123 · April 18, 2018, 1:42am

I get it now. Thanks

alexthelemon · April 18, 2018, 1:50am

In addition, the time and memory information that you get back should only be for the “feedback-instances”.

And I think it is also probably better not to include the feedback-instances in the final score because too much is known about them. (Another reason: if you do include them, as happens now, then this makes @algmyr’s suggestion not work properly. That is, even if only the last submitted program is scored in the final ranking, you still have an incentive to keep resubmitting a random algorithm until you get a good visible score, even though you can only see part of your score.)

algmyr · April 18, 2018, 2:13am

Personally I think one solution would be to completely separate provisional tests from the actual tests. Provisional tests will only give a temporary hint of the performance of programs, but is not included in the actual set of tests. No information should be given on the hidden tests. Since the data generation algorithm is given you can easily test performance on your own system, and the provisional tests should catch most server side stuff.

Combined with my earlier comment about only judging the last submission this should both prevent reverse engineering and discourage spam submissions.

alexthelemon · April 18, 2018, 4:46am

I would be happy with that option (no feedback from actual tests at all), but I got the impression people wanted a bit of certainty that their programs would still work with the actual tests, so I made the above suggestion (20 return codes) as a compromise. But your suggestion has the virtue of simplicity, and as you say it’s unlikely a program that passes the provisional tests would fail to complete the actual tests (though you could just about imagine that it runs it 3.96 seconds on the provisional tests, but TLEs at 4.01 seconds on the actual tests due to a slight difference in the data).

alexthelemon · April 18, 2018, 4:58am

Judging on the last submission only is tempting, though it might make the comparison a bit more random because the tail of the score distribution obtained from a random algorithm (which you get from maxing over lots of attempts) probably has less variance than a single instance.

But this could be fixed by increasing the number of hidden test cases. And this wouldn’t require extra server time compared to what happens now because the server would only need to run a single entry, not all 200. (Though it may delay scoring slightly after the contest if they aren’t being run pre-emptively.)

admin · April 18, 2018, 7:39pm

@algmyr, that’s a very interesting suggestion. We will discuss about this. Thanks!

algmyr · April 18, 2018, 9:13pm

@admin Also check out my comment under Invitation to CodeChef April Long Challenge 2018! - #13 by alexthelemon - general - CodeChef Discuss, it expands the idea into a more complete solution, both regarding reverse engineering and spam submissions.

mgch · April 18, 2018, 9:38pm

@algmyr what is the sense of having provisional tests there? You can optimize the solution for it and receive overfitting(as saying in ML) and in the end, your time will be wasted cause final tests will be completely different. It almost has no sense of checking the solutions in the contest, am I wrong? I have another suggestion: what if we’ll try to use multitests in the challenge(around 50-1000 per test case, different types are combined) and testing will be provided only on 5-10% of data. I guess it will be hard for unfair solutions to get the test data. What do you think about that?

admin · April 18, 2018, 10:16pm

Thank you for this informative discussion We will definitely take these points into consideration while figuring out what to do. We’ll get back to you soon.

vijju123 · April 18, 2018, 10:21pm

Yes, solutions which have T=1 are far more prone. With mixed solutions of different kinds, I think we can minimize the issue by a good factor.

algmyr · April 18, 2018, 11:20pm

@mgch Provisional tests would be there only to give a rough indicator of how you stack up. If you have a solution that performs consistently it will also be a decent estimate of your final score, similar to what the visible test is today. Even today you have no idea if the hidden tests are vastly different, you pretty much presume that the visible test is representative already. Also, importantly, you are given the data generation algorithm so that you can generate your own test cases to benchmark your program to see that if performs well in general.

algmyr · April 18, 2018, 11:28pm

@mgch If you’re worried that the provisional test cases are not representative you could always add a few more cases of each type to reduce impact of potential outliers. If the final tests are run after the competition (and only on the final submission) this would still be less computationally intensive than running the full test suite on every submission as it’s done today (from what I’ve understood). What I fundamentally would like to enforce is a separation between sets of test cases so you can’t gather information on the point giving tests during the competition.

alexthelemon · April 19, 2018, 7:34pm

I should correct something I said above. It’s not just the return code that leaks information: another mechanism is the reported memory usage. If you want to discover the number ‘x’ from a hidden test case you just allocate ‘x’ MB in your code then stop. The results page will then show you what ‘x’ was. (I didn’t realise that the memory usage from the results page included that of the hidden cases.)

So a simple change, even if nothing else is changed, would be to make the result page only report the time and memory usage from the provisional test cases, not the hidden test cases.