Invitation to CodeChef April Long Challenge 2018!

alexthelemon · April 18, 2018, 1:50am

In addition, the time and memory information that you get back should only be for the “feedback-instances”.

And I think it is also probably better not to include the feedback-instances in the final score because too much is known about them. (Another reason: if you do include them, as happens now, then this makes @algmyr’s suggestion not work properly. That is, even if only the last submitted program is scored in the final ranking, you still have an incentive to keep resubmitting a random algorithm until you get a good visible score, even though you can only see part of your score.)

algmyr · April 18, 2018, 2:13am

Personally I think one solution would be to completely separate provisional tests from the actual tests. Provisional tests will only give a temporary hint of the performance of programs, but is not included in the actual set of tests. No information should be given on the hidden tests. Since the data generation algorithm is given you can easily test performance on your own system, and the provisional tests should catch most server side stuff.

Combined with my earlier comment about only judging the last submission this should both prevent reverse engineering and discourage spam submissions.

alexthelemon · April 18, 2018, 4:46am

I would be happy with that option (no feedback from actual tests at all), but I got the impression people wanted a bit of certainty that their programs would still work with the actual tests, so I made the above suggestion (20 return codes) as a compromise. But your suggestion has the virtue of simplicity, and as you say it’s unlikely a program that passes the provisional tests would fail to complete the actual tests (though you could just about imagine that it runs it 3.96 seconds on the provisional tests, but TLEs at 4.01 seconds on the actual tests due to a slight difference in the data).

alexthelemon · April 18, 2018, 4:58am

Judging on the last submission only is tempting, though it might make the comparison a bit more random because the tail of the score distribution obtained from a random algorithm (which you get from maxing over lots of attempts) probably has less variance than a single instance.

But this could be fixed by increasing the number of hidden test cases. And this wouldn’t require extra server time compared to what happens now because the server would only need to run a single entry, not all 200. (Though it may delay scoring slightly after the contest if they aren’t being run pre-emptively.)

admin · April 18, 2018, 7:39pm

@algmyr, that’s a very interesting suggestion. We will discuss about this. Thanks!

algmyr · April 18, 2018, 9:13pm

@admin Also check out my comment under Invitation to CodeChef April Long Challenge 2018! - #13 by alexthelemon - general - CodeChef Discuss, it expands the idea into a more complete solution, both regarding reverse engineering and spam submissions.

mgch · April 18, 2018, 9:38pm

@algmyr what is the sense of having provisional tests there? You can optimize the solution for it and receive overfitting(as saying in ML) and in the end, your time will be wasted cause final tests will be completely different. It almost has no sense of checking the solutions in the contest, am I wrong? I have another suggestion: what if we’ll try to use multitests in the challenge(around 50-1000 per test case, different types are combined) and testing will be provided only on 5-10% of data. I guess it will be hard for unfair solutions to get the test data. What do you think about that?

admin · April 18, 2018, 10:16pm

Thank you for this informative discussion We will definitely take these points into consideration while figuring out what to do. We’ll get back to you soon.

vijju123 · April 18, 2018, 10:21pm

Yes, solutions which have T=1 are far more prone. With mixed solutions of different kinds, I think we can minimize the issue by a good factor.

algmyr · April 18, 2018, 11:20pm

@mgch Provisional tests would be there only to give a rough indicator of how you stack up. If you have a solution that performs consistently it will also be a decent estimate of your final score, similar to what the visible test is today. Even today you have no idea if the hidden tests are vastly different, you pretty much presume that the visible test is representative already. Also, importantly, you are given the data generation algorithm so that you can generate your own test cases to benchmark your program to see that if performs well in general.

algmyr · April 18, 2018, 11:28pm

@mgch If you’re worried that the provisional test cases are not representative you could always add a few more cases of each type to reduce impact of potential outliers. If the final tests are run after the competition (and only on the final submission) this would still be less computationally intensive than running the full test suite on every submission as it’s done today (from what I’ve understood). What I fundamentally would like to enforce is a separation between sets of test cases so you can’t gather information on the point giving tests during the competition.

alexthelemon · April 19, 2018, 7:34pm

I should correct something I said above. It’s not just the return code that leaks information: another mechanism is the reported memory usage. If you want to discover the number ‘x’ from a hidden test case you just allocate ‘x’ MB in your code then stop. The results page will then show you what ‘x’ was. (I didn’t realise that the memory usage from the results page included that of the hidden cases.)

So a simple change, even if nothing else is changed, would be to make the result page only report the time and memory usage from the provisional test cases, not the hidden test cases.

vijju123 · April 20, 2018, 2:24pm

Will ask them to take this feedback as well. Though I feel second step can take time to implement

@aryanc403 xD. I knoow what you’re feeling, but usually we test things out and bring changes to contests only when fully sure. So, things may take some time.

admin · April 20, 2018, 3:02pm

@oleg_b Thanks for elucidating the reasons so well We are looking into this.

alexthelemon · April 20, 2018, 8:14pm

Hi oleg_b! I completely agree with your first three paragraphs.

Maybe it’s not necessary to give the contestant a special choice of solution. Under last-one-counts, if a contestant wants a different submission as his/her final answer, they just have to resubmit it so that it becomes the last one.

(I say this because I don’t want to put too much burden on admins to change the UI. Probably the less we ask for, the greater the chance of a change. I know I suggested a UI change earlier, but I’m coming round to algmyr’s POV that last solution + complete separation is the simplest/best method.)

alexthelemon · April 20, 2018, 8:25pm

As for reducing randomness, I looked at stats for CHEFPAR to see what might be feasible. In Div 1 there were ~6400 submissions from 274 people, and in Div 2 ~10300 from 1158. Current system (AIUI) evaluated 20*16700 tests. Last-one-counts would evaluate 1432n tests. Breakeven is n=233 tests, but it’s not quite so simple because it’s not ideal to have all the server burden at the end of the contest. Probably the server would have to speculatively run some solutions in the hope that they don’t change. So maybe n=100 would be more realistic? Unless there is a spare server capacity - I don’t know.