Key Takeaways
In the ChatGPT era, OpenAI has consistently topped AI leaderboards, with each new model raising the bar in areas such as mathematical reasoning, coding, and long-context understanding.
However, rivals are closing the gap, especially new models emerging from China that have clocked some impressive benchmark scores.
Amid rising competition, OpenAI has come under fire for secretly funding and accessing a benchmarking dataset, raising questions over the legitimacy of of its performance claims.
The benchmark at the heart of the controversy is FrontierMath—a set of hundreds of difficult math problems specifically designed to push the limits of what is possible with AI.
FrontierMath was created as a way to challenge frontier AI models that already achieve near-perfect scores on easier tests.
OpenAI commissioned the new benchmark. However, researchers, including mathematicians who contributed to the project, were kept in the dark about the company’s involvement until recently.
Worse still, OpenAI had access to all but 50 of the problems in the dataset. However, the company left that detail out when it boasted about its latest model’s 25% score on FrontierMath.
EpochAI, the research institute that developed the benchmark, has sought to pass the blam e on to OpenAI, highlighting clauses in the contract that prevented full disclosure.
Moreover, Epoch director Tamay Besiroglu insisted that OpenAI made a “verbal agreement” not to use FrontierMath problems for AI training.
However, given that OpenAI’s o3 reportedly scored more than ten times higher than other leading models, many are now suspicious of those results.
Crucially, the furor over FrontierMath comes at a time when OpenAI’s models no longer dominate AI rankings as they once did.
Alongside traditional rivals like Google and Meta, a crop of upstart Chinese players are now snapping at OpenAI’s heels.
On Monday, Jan. 20, Hangzhou-based DeepSeek released a new open-source model, R1, that it claims beats OpenAI’s o1 on the AIME, MATH-500, and SWE-bench benchmarks.
Then, on Wednesday, ByteDance unveiled Doubao 1.5 Pro , a new language model that delivers GPT-4o-level performance with just a fraction of the computing power.
Gone are the days when Chinese AI models were little more than cheap knockoffs of their American counterparts.
Doubao 1.5 Pro and DeepSeek R1 prove that OpenAI’s technological supremacy is far from secured. And if the company’s AI crown is usurped, it may not be an American rival that does it.