Invitation to CodeChef April Long Challenge 2018!

In addition, the time and memory information that you get back should only be for the “feedback-instances”.

And I think it is also probably better not to include the feedback-instances in the final score because too much is known about them. (Another reason: if you do include them, as happens now, then this makes @algmyr’s suggestion not work properly. That is, even if only the last submitted program is scored in the final ranking, you still have an incentive to keep resubmitting a random algorithm until you get a good visible score, even though you can only see part of your score.)

Personally I think one solution would be to completely separate provisional tests from the actual tests. Provisional tests will only give a temporary hint of the performance of programs, but is not included in the actual set of tests. No information should be given on the hidden tests. Since the data generation algorithm is given you can easily test performance on your own system, and the provisional tests should catch most server side stuff.

Combined with my earlier comment about only judging the last submission this should both prevent reverse engineering and discourage spam submissions.

I would be happy with that option (no feedback from actual tests at all), but I got the impression people wanted a bit of certainty that their programs would still work with the actual tests, so I made the above suggestion (20 return codes) as a compromise. But your suggestion has the virtue of simplicity, and as you say it’s unlikely a program that passes the provisional tests would fail to complete the actual tests (though you could just about imagine that it runs it 3.96 seconds on the provisional tests, but TLEs at 4.01 seconds on the actual tests due to a slight difference in the data).

Judging on the last submission only is tempting, though it might make the comparison a bit more random because the tail of the score distribution obtained from a random algorithm (which you get from maxing over lots of attempts) probably has less variance than a single instance.

But this could be fixed by increasing the number of hidden test cases. And this wouldn’t require extra server time compared to what happens now because the server would only need to run a single entry, not all 200. (Though it may delay scoring slightly after the contest if they aren’t being run pre-emptively.)

@algmyr, that’s a very interesting suggestion. We will discuss about this. Thanks!

@admin Also check out my comment under Invitation to CodeChef April Long Challenge 2018! - #13 by alexthelemon - general - CodeChef Discuss, it expands the idea into a more complete solution, both regarding reverse engineering and spam submissions.

@algmyr what is the sense of having provisional tests there? You can optimize the solution for it and receive overfitting(as saying in ML) and in the end, your time will be wasted cause final tests will be completely different. It almost has no sense of checking the solutions in the contest, am I wrong? I have another suggestion: what if we’ll try to use multitests in the challenge(around 50-1000 per test case, different types are combined) and testing will be provided only on 5-10% of data. I guess it will be hard for unfair solutions to get the test data. What do you think about that?

1 Like

Thank you for this informative discussion :slight_smile: We will definitely take these points into consideration while figuring out what to do. We’ll get back to you soon.

Yes, solutions which have T=1 are far more prone. With mixed solutions of different kinds, I think we can minimize the issue by a good factor.

@mgch Provisional tests would be there only to give a rough indicator of how you stack up. If you have a solution that performs consistently it will also be a decent estimate of your final score, similar to what the visible test is today. Even today you have no idea if the hidden tests are vastly different, you pretty much presume that the visible test is representative already. Also, importantly, you are given the data generation algorithm so that you can generate your own test cases to benchmark your program to see that if performs well in general.

@mgch If you’re worried that the provisional test cases are not representative you could always add a few more cases of each type to reduce impact of potential outliers. If the final tests are run after the competition (and only on the final submission) this would still be less computationally intensive than running the full test suite on every submission as it’s done today (from what I’ve understood). What I fundamentally would like to enforce is a separation between sets of test cases so you can’t gather information on the point giving tests during the competition.

1 Like

I should correct something I said above. It’s not just the return code that leaks information: another mechanism is the reported memory usage. If you want to discover the number ‘x’ from a hidden test case you just allocate ‘x’ MB in your code then stop. The results page will then show you what ‘x’ was. (I didn’t realise that the memory usage from the results page included that of the hidden cases.)

So a simple change, even if nothing else is changed, would be to make the result page only report the time and memory usage from the provisional test cases, not the hidden test cases.

1 Like

Will ask them to take this feedback as well. Though I feel second step can take time to implement

@aryanc403 xD. I knoow what you’re feeling, but usually we test things out and bring changes to contests only when fully sure. So, things may take some time. :slight_smile:

@oleg_b Thanks for elucidating the reasons so well :slight_smile: We are looking into this.

Hi oleg_b! I completely agree with your first three paragraphs.

Maybe it’s not necessary to give the contestant a special choice of solution. Under last-one-counts, if a contestant wants a different submission as his/her final answer, they just have to resubmit it so that it becomes the last one.

(I say this because I don’t want to put too much burden on admins to change the UI. Probably the less we ask for, the greater the chance of a change. I know I suggested a UI change earlier, but I’m coming round to algmyr’s POV that last solution + complete separation is the simplest/best method.)

1 Like

As for reducing randomness, I looked at stats for CHEFPAR to see what might be feasible. In Div 1 there were ~6400 submissions from 274 people, and in Div 2 ~10300 from 1158. Current system (AIUI) evaluated 20*16700 tests. Last-one-counts would evaluate 1432n tests. Breakeven is n=233 tests, but it’s not quite so simple because it’s not ideal to have all the server burden at the end of the contest. Probably the server would have to speculatively run some solutions in the hope that they don’t change. So maybe n=100 would be more realistic? Unless there is a spare server capacity - I don’t know.