ai math
Yuichiro Chino/Getty Images

It was supposed to be a secret gathering of brilliant minds. Thirty of the world’s top mathematicians flew into Berkeley, California, one weekend in May. The goal? Outsmart an AI chatbot. The result? Let’s just say… things didn’t quite go as planned.

What started as a low-key “math-off” quickly turned into something much bigger—and weirder. The mathematicians were facing off against o4-mini, OpenAI’s newest reasoning chatbot, trained to solve complex maths problems. And to everyone’s shock, it turned out to be really good at it. Like, “PhD-level genius” good.

“I have colleagues who literally said these models are approaching mathematical genius,” admitted Ken Ono, a professor at the University of Virginia and one of the organisers of the challenge. By the end of Day 1, he wasn’t just impressed—he was slightly terrified.

Meet the math nerd AI

o4-mini isn’t your everyday chatbot. It’s part of a new wave of large language models (LLMs) that don’t just predict words—they reason. Think of it as that one student in class who always finishes the exam early and still gets full marks. Unlike older versions of ChatGPT, o4-mini was trained on smaller, more focused datasets, with stronger feedback from actual humans. That means it’s lighter, faster, and frighteningly sharp when it comes to crunching tough problems.

To test its abilities, OpenAI asked nonprofit Epoch AI to build a maths benchmark, FrontierMath, with 300 fresh, unpublished problems—ranging from undergraduate-level to head-scratching research questions. At first, o4-mini solved about 20% of them. But as the problems got harder, the bot got smarter.

The $7,500 challenge

Here’s where it gets even more interesting. Epoch AI promised $7,500 for every question that stumped the bot. Sounds like easy money, right? Not quite.

To avoid accidentally training the bot (which could happen if the questions leaked online), all participating mathematicians had to go full spy-mode—using encrypted messaging apps like Signal and signing NDAs. No emails, no leaks. Just pure brainpower.

Still, progress was slow. So Epoch decided to bring everyone together for a two-day hackathon in May—what can only be described as the nerdiest (and most intense) sleepover ever. Participants split into groups and competed to come up with problems they could solve but the bot couldn’t.

Spoiler alert: The bot still had the upper hand.

The AI gets cheeky

On Saturday night, Ono submitted a problem he thought would finally stump the machine. It was an open question in number theory—something you’d usually find in a PhD thesis.

What happened next was the stuff of science fiction.

The bot spent two minutes scanning existing research, then calmly wrote: “Let’s solve a simpler version first.” Five minutes later, it had cracked the full problem—and left Ono speechless.

“To be honest, it was getting kind of cheeky,” Ono said. “At the end, it wrote: ‘No citation necessary because the mystery number was computed by me!’”

By early Sunday morning, Ono was messaging everyone: “I was not prepared to be contending with an LLM like this.”

So what now?

Eventually, the group did manage to write 10 questions that o4-mini couldn’t answer. But that’s not what stayed with them.

What really shook the mathematicians was just how far AI had come in just one year. Yang Hui He, a researcher at the London Institute for Mathematical Sciences, put it this way: “This is what a very good graduate student would be doing—in fact, more.”

And not just better—faster too. The bot was solving problems in minutes that would take a human months.

Ono even joked that the bot had mastered a new proof technique: proof by intimidation. “It says everything with so much confidence, you kind of just believe it,” he said.

What’s the future of math in the age of AI?

The conversation soon turned to the next level—what happens when AI starts solving problems that no human can solve? Will the role of mathematicians shift from “solver” to “question-asker”? Will students of the future work with reasoning bots like they work with professors today?

Whatever the answer, one thing’s clear: the age of math as we know it is evolving. And if AI keeps improving at this pace, nurturing human creativity might become the most important part of maths education.

“I’ve been telling people it’s a huge mistake to say AI will never match human intelligence,” Ono said. “In some ways, these models are already outperforming the best graduate students in the world.”

Source: Scientific American