Elo-MMR ratings go live on CodeChef!

admin · July 13, 2022, 5:08am

This is a blog post by Aram and Paul:

Readers of my Codeforces blog may recall that inutard and I have been experimenting with rating systems for quite some time. Last year, we demonstrated how to break the Topcoder system, and published our findings at the World Wide Web 2021 research conference. There, we derived a new Bayesian rating system from first principles. This system was specifically motivated by sport programming, though it may in theory be applied to any sport that ranks lots of contestants. Recently, it was adopted by the Canadian contest judge, DMOJ.

Now, in collaboration with the CodeChef admins, we are pleased to announce that upon the completion of July Lunchtime, the Elo-MMR rating system goes live on CodeChef.com! While recognizing that any rating system migration will be disruptive, we hope that you’ll find the advantages worthwhile. Let’s briefly go over them. Elo-MMR is:

Principled: using rigorous, peer-reviewed, Bayesian derivations.
Incentive-compatible: it’s never beneficial to score fewer points.
Robust: players will never lose too much rating for a single bad day.
Open: the algorithm and its implementation are available under open licenses.
Fast: on a modern PC, CodeChef’s entire history is processed in under 30 minutes; using close approximations, we’ve further reduced it to under 30 seconds.

In addition, Elo-MMR offers:

A better rating distribution: CodeChef and Codeforces have a high spread at the top: for instance, gennady.korotkevich has a 1000+ point lead over CodeChef’s other top users, whereas Elo-MMR brings the gap below 150 points. Conversely, the old systems provide less spread in the low-to-median percentiles, compared to the wider numerical range that Elo-MMR allocates for the majority of users to progress through.
Conservative ratings / newcomer boost: Similarly to Microsoft’s TrueSkill, the publicly displayed rating will be a high-confidence lower bound on the player’s actual skill. As a result, rather than starting at the median and moving down, newcomers will start near the bottom and move up as the system becomes more confident in their skills. This provides a sense of progression and achievement.
Faster convergence for experts: Compared to the previous rating system, Elo-MMR awards a much higher increase to users who do well in their first CodeChef rounds.

It’s worth noting that no special hacks were taken to ensure these properties: they emerge naturally from the mathematical derivation. For details, please see the latest revision of our research article.

To demonstrate the difference in rating distributions, let’s plot them for CodeChef users, using both the old and new (Elo-MMR) rating systems. The latter’s parameters were chosen in such a way as to make the two rating scales approximately comparable. We see that although the original CodeChef ratings take up a wider range overall, gennady.korotkevich alone occupies a big chunk at the upper end! Elo-MMR, on the other hand, spreads the bottom 80% of the population over a wider range.

Plot

So, how are we handling the migration to Elo-MMR ratings? One option we considered is to simply take the current ratings and apply Elo-MMR updates from now on. This is not ideal, since rating systems can take a long time to converge: for example, consider the rating distribution of any programming contest site in its first year of operation. At the opposite extreme, we can retroactively recompute all contests in CodeChef’s history using Elo-MMR. DMOJ took this approach; it better leverages the site’s history, but the resulting transition is very sharp.

To smooth it out, CodeChef retroactively computed everyone’s current Elo-MMR ratings in the backend, but will continue to show the original CodeChef rating. Every time you compete, your backend Elo-MMR (MM) rating will be updated, and then your public CodeChef (CC) rating will be pulled closer to your MM rating according to the formula:

CC_{\text{after}} = MM_{\text{after}} + 0.75(1 - \tfrac{1}{n})(CC_{\text{before}} - MM_{\text{before}}),

where n is the number of rounds in which you’ve taken part. Thus, new players, who do not yet have a solidified CC rating, are moved to the new system more rapidly. The most experienced players are pulled one-fourth of the way each time they compete.

Having examined the population, we might also ask how much the ratings of specific individuals change between the two systems. To find out, we first eliminate the noisiest data: players who have been caught cheating, and newcomers with at most 10 lifetime contest participations. For all remaining users, we compare their original CodeChef rating to their retroactively computed Elo-MMR ratings. The statistics are summarized as follows

World champion gennady.korotkevich is the main outlier, with CC - MM = 1079. Thus, over the next 10 or so contests that it will take for the new ratings to come into effect, he is expected to lose about 1000 points. Thanks Gennady, for understanding and being fine with this!
Less than 800 additional users have CC - MM > 200, and none of these are over 500. Nonetheless, these users’ ratings will most likely decrease over their next few contests.
Around 8000 users have MM - CC > 200, the highest of them being 630. These players will experience a rating boost over their next few contests.
All other users have |MM - CC| \le 200, so they will experience minimal disruptions.

For the next few months, the day after every rated contest, we will update the Elo-MMR ratings (which aren’t shown on the user profile) of every user in this drive, so that interested users can investigate their rating trajectories.

After a period of time (less than a year), ever user’s (including those who aren’t participating in contests now) rating will be changed fully to Elo-MMR rating.

We welcome any questions and will do our best to answer them all!

Update - 2nd Aug 2022:

We have introduced Provisional Ratings, which signify that a new user’s ratings aren’t very indicative of their actual rating for the first few contests that they participate in.

In particular, until a user participates in at least 5 rated contests, their rating will be accompanied by a question mark, indicating that it is a Provisional Rating.

Update 2 - 5th Aug 2022:

After launching Elo-MMR, we got feedback from a lot of users about the sudden increase in ratings of new users, particularly in their first couple of contests. This was supposed to be a feature, but it had been a bit too much. So now, we have introduced a change, in which the weight of the first and second contests of a new user is changed to 60% and 80% of their old values, respectively. This leads to a slower increase in ratings of new users.

This change has been applied for all users and their Elo-MMR rating recalculated from the beginning of time, and hence most users will see a change in their display ratings. More details can be found here.

admin · July 13, 2022, 5:16am

Cross-posted from Elo-MMR ratings go live on CodeChef! - Codeforces

Huge shoutouts to Aram and Paul for patiently answering all our queries, and helping us in the transition over the last 4 months!

termii · July 16, 2022, 8:29pm

Less than 800 additional users have CC - MM > 200, and none of these are over 500. Nonetheless, these users’ ratings will most likely decrease over their next few contests.

I assume these are mostly 6&7 star users. I fail to see why you don’t adjust their ratings now. If they don’t participate, they will end up having a better rating than those participating. Which means those that participate in contests are indirectly punished.

I would have preferred if you replaced the CC ratings of all users with their Elo-MMR. It may be confusing in the beginning, but your approach seems to be confusing long-term. Which is worse.

ephiram_renais · July 16, 2022, 8:48pm

Precisely, it does not make sense to me. People can just choose to not participate and have good ratings which sucks

admin · July 16, 2022, 8:56pm

Among those ~800 users, less than 25 are currently 6 or 7 stars.

Everyone’s rating will be force-changed to MM after a while.

admin · July 17, 2022, 8:18pm

admin · July 17, 2022, 8:22pm

The drive has been updated with LTIME110 ratings.

hrsh_panwar · July 17, 2022, 8:51pm

Seems like this ELO system draws a lot from past contest history . renatyss’s MMR jumped from 2339 to 2434 while my MMR changed by 2271 to 2350. Even though i had a better rank than renatyss in today’s lunchtime. So you can get a greater delta from a higher MMR even with a worse rank. Why are these two rating diverging apart from each other in opposite direction?

anon84382313 · July 18, 2022, 6:06am

hii admin i have found new problem in codechef , due to new interface i have forgot that is discussion forum for every question i mean in this present new interface it trown corner names “discuss forum” in solution tab where as in old interface it shows there at solution section it is really visible . please continue discussion forum below editorial instead for new link . not only me there are so many people found this difficult to remenber ther is discussion forum . you can observer there is less discussion happening with this new interface when compared to old . discussion forum really very helped to learn new ideas and techniques where i have gone in my code now i am seeing no discussion please rectify this issue

tushar_78165 · July 18, 2022, 9:22am

what if someone’s MM > CC , and he choose not to participate in any contest . will his CC going to be replaced by his MM also.

ebtech · July 18, 2022, 9:30am

Yes, the internal state is a bit more complicated than just \mu,\sigma. It’s likely that your friend had already acquired evidence of being at the higher skill level, maybe due to some recent high performances. The system assigns less weight to outlier performances, but may increase that weight if repeat wins demonstrate it not to be a fluke.

admin · July 18, 2022, 9:59am

Yes - “After a period of time (less than a year), ever user’s (including those who aren’t participating in contests now) rating will be changed fully to Elo-MMR rating.”

ssvb · July 18, 2022, 9:02pm

Faster convergence for experts: Compared to the previous rating system, Elo-MMR awards a much higher increase to users who do well in their first CodeChef rounds.

I can’t call myself an expert, but I was not a complete beginner when I participated in my first CodeChef contest. Still my performance was far from optimal (rank 302 in division 3), largely because I wasn’t familiar with the UI of the CodeChef platform during that contest. Now I wonder, how much did this blunder affect my curent Elo-MMR score? Does Elo-MMR quickly adapt or does it assign a high weight to the first contests, which sticks longer?

Also even complete beginners may improve really fast after only a few contests thanks to learning new tricks and algorithms (there are a lot of low hanging fruits). Does the new rating system now penalize beginners, making the results of their first contests more important?

ebtech · July 18, 2022, 11:59pm

The weight of the first contests is not increased. Instead, the weight of the provisional starting rating is decreased. Glicko and TrueSkill do essentially the same thing.

ssvb · July 19, 2022, 6:46pm

The weight of the first contests is not increased. Instead, the weight of the provisional starting rating is decreased.

Isn’t this basically the same thing? What you are saying sounds like the new system essentially replaces the initial provisional rating with the performance of the first contests. Making the first contests very important, considering that the system is also “Robust: players will never lose too much rating for a single bad day” (“players will never gain too much rating for a single good day”), which may indicate that the system adapts slowly after the initial bootstrap.

If the system converges very fast for new accounts but then adapts very slowly, then this may provide a strong incentive for accounts rerolling (to get a more favorable starting position).

Could you please calculate Elo-MMR scores of CodeChef users after discarding the results of their first contest and share the results on drive? I would like to confirm whether the impact of the first contest still persists many contests later or not.

BTW, looks like your account may be another example of an expert (you were candidate master on codeforces in 2012), who happened to have bad performance in their first CodeChef contest.

ebtech · July 19, 2022, 7:41pm

It’s not the same thing: a user who has done a lot of contests would have almost zero weight on either the provisional rating or the first few contests. For such a user, feature #8 is completely irrelevant.

If you’re worried about robustness, the system does have a bit of a memory. A player who is consistently overperforming (or underperforming), or who performs inconsistently, would be allowed larger updates, whereas a very consistent player’s rating would be less sensitive. It’s basically a gentler version of the “volatility” parameter in the older systems, but carefully designed to retain incentive-volatility.

termii · July 19, 2022, 8:07pm

Did you consider that the system may grant too high of a “newbie” bonus?
look at this user:

#24 Div4 and #5 in Div4. 2 contests total and his MMR is 2166. I can easily score better than him but I cannot reach 2166 MMR. Reaching almost 6* with only Div 4 contests can’t be correct.

ebtech · July 20, 2022, 12:44am

To be fair, this user wouldn’t normally have been placed into Div 4: their Elo-MMR rating after the first contest was already too high.

The #4 placer is a better example of a user who got a huge boost on their first contest, to 1966. Is that too high? Well, keep in mind that Topcoder rated Petr at 1866 after his first round, which was half as long as a CodeChef round (and I think they use a smaller \mu of 1200 instead of 1500, though Elo-MMR has the 3\sigma penalty from feature #7 instead…)

Now, I don’t have enough experience with Div 4 contests to judge how fair this is. The rating systems are not aware of problem difficulty; they just assume that the problems are sufficiently calibrated to accurately rank all the contestants. If the user had done well in Div 2 or 1 instead, I imagine you’d be less surprised.

One approach we could take is to cap performance scores from Div 4 contests to something like 2000. I don’t know if that’s necessary, but it would be interesting to see if these users actually turn out to be overrated when they later go on to compete in Div 1-2.

nikhil_kumar21 · July 21, 2022, 7:24am

@admin want to say something??.. Now everyone will make a new account and get a 5 or 6 star rating with ease. All that matters is the heighest rating, which we will write in our CV. Soon CC rating won’t be considered for evaluating someone’s skill.

akiitb2022 · July 25, 2022, 12:01pm

Watch his profile. igor_y.

He almost went to 5* just for getting Rank 1 in a single Div.3 Long Challenge.

Is it really good to have so much rating change in a single contest?

Shouldn’t there be an upper limit otherwise everyone would just make a new account and try to TOP in Long Challenge ?