IMO 2025: old and new friends, the Sunshine Coast, and AI
I recently volunteered to help with the problem setting and grading of the International Mathematical Olympiad, which was hosted by my home country Australia at the beautiful Sunshine Coast this year! It was a once-in-a-lifetime experience; I reconnected with friends who I hadn’t seen in 10+ years and made many new friends too. I will remember chanting “Aussie Aussie Aussie, Oi Oi Oi” led by the Australian team and Australian olympiad alumni proudly gathering around to photograph this year’s team after the closing ceremony. A massive congratulations to the team – they placed 15th and came away with 2 gold, 2 silver, and 2 bronze medals this year!
Some of you may have recently heard about AI companies making announcements about their models’ performance against the IMO problems. I am all for technological progress and I have been excited to see the impressive developments in AI reasoning capabilities over the last few years. This is important research that I am sure will benefit the world and the maths community in the years to come. However, the way AI companies have gone about this IMO is disappointing, and I hope that they will do better going forward.
One such announcement was made on X/Twitter by an AI company within one hour of the students receiving their medals at the IMO closing ceremony. These students have worked tremendously hard and their accomplishments – unaided by exposure to the entirety of the Internet and immense computational resources – deserve to be in the sole limelight for much longer than one hour. Here are some examples of the impact this has:
- Immediately after the closing ceremony, when we were at a dinner celebrating the contestants and the IMO, employees of another AI company approached IMO graders with laptops asking us to grade their model’s solutions then and there.
- Competing AI companies have since rushed to announce the results of their models as well. Unfortunately, we will likely see more such announcements in the coming days.
- Media coverage of this year’s IMO has since focused heavily on the claimed performances of AI models on this year’s IMO problems. This media coverage would otherwise have likely been directed towards the achievements of students.
Also, benchmarking AI models against competitions such as the IMO should be done in a scientific and controlled environment, with parameters that are publicly specified and dictated by an arbiter outside of these AI companies. In particular, any claims about AI models’ solutions being “officially graded” are misrepresentations at best; consulting a couple of individual coordinators/graders for verification is very different from having submissions graded through channels endorsed by all official personnel including those who led the IMO problem setting and coordination.
Terry Tao has written a nice post on this, which I encourage you to read. I will add to this that the IMO exam fluctuates heavily in content and composition from year to year, which means that a perfect apples-to-apples comparison of AI models now vs. a year ago is essentially impossible. Benchmarking AI models on the IMO exam is of course useful; I am just wary of making unnecessary extrapolations about any broader implications of these results.
Many thanks to my fellow graders at the IMO for a wonderful few weeks and for their perspectives on this!