Home / News / Technology / AI / Elon Musk Claims Grok 3 Outperforms All Chatbots After xAI Engineer Resigns Over AI Ranking Dispute
AI
3 min read

Elon Musk Claims Grok 3 Outperforms All Chatbots After xAI Engineer Resigns Over AI Ranking Dispute

Last Updated
James Morales
Last Updated
Key Takeaways
  • Elon Musk has said that the unreleased Grok 3 outperforms all other currently released chatbots.
  • However, xAI engineer Benjamin De Kraker recently ranked Grok below OpenAI’s models for coding ability.
  • De Kraker claimed his employer asked him to remove the post or face being fired.

Unveiling xAI’s latest chatbot, Grok 3, on Monday, Feb. 17, Elon Musk boasted of the model’s high benchmark scores and its ascendency to pole position on Chatbot Arena.

However, he watered down his previous statements that Grok 3 outperforms all other AIs on the market.

That claim is disputed by an xAI engineer, who said he was recently forced out for ranking Grok behind other leading AI models on social media.

Musk Unveils Grok 3 Benchmarks

In an X live feed on Monday, Musk was joined by xAI’s top engineers to discuss the latest iteration of their AI model.

Musk described the new AI as being “an order of magnitude more capable than Grok 2” and showed off its superior benchmark performance compared to some of its peers.

However, his statement at the Dubai World Governments Summit on Thursday, Feb. 13, that Grok 3 outperforms “anything that’s been released” must be taken with a grain of salt.

Grok 3’s standout achievement is its high score on Chatbot Arena . The model currently holds the top spot on the AI leaderboard, which ranks AI based on blind user tests.

But when it comes to other benchmarks used to evaluate AI performance, xAI’s model falls behind OpenAI’s o1, and the currently unreleased o3.

For example, on the American Invitational Mathematics Examination (AIME), Grok 3 reportedly scored 54%, above GPT-4o but below o1 and DeepSeek R1, which scored 83.3% and 79.8% respectively.

OpenAI claims its unreleased o3 model performs even better with 96.7% accuracy on the AIME evaluation.

While benchmarks make AI comparison possible, static performance metrics are at odds with increasingly dynamic AI development standards. Unlike its predecessors, Musk said xAI will be “continuously improving” Grok 3 going forward.

xAI Engineer Ranks Grok3 Below OpenAI Models

In an X post on Feb.8, Benjamin De Kraker, who worked on the human data team for Grok development, discussed his opinion on the top AI models for code.

In De Kraker’s view, OpenAI’s o1-pro, o1 and o3-mini are all tied for the top spot. He placed Grok 3 in fourth position, followed by DeepSeek R1 and GPT4o.

“Delete the Post or Face Being Fired”

On Thursday, Feb. 12, De Kraker issued a follow-up post in which he claimed, “xAI told me I either had to delete [the post] or face being fired.”

Facing such an ultimatum, De Kraker said he decided to retain his “free speech and dignity” and resign.

“It’s very disappointing to me that a company and leaders who supposedly champion free speech and openness would try to fire a low-level employee over a clearly labeled opinion that contains absolutely nothing controversial,” he added.

De Kraker’s predicament hasn’t gone unnoticed by xAI’s top brass. Commenting on the most recent post, CEO Musk said, “That’s weird.” However, he didn’t indicate whether he planned to intervene or not.

Was this Article helpful? Yes No

James Morales

Although his background is in crypto and FinTech news, these days, James likes to roam across CCN’s editorial breadth, focusing mostly on digital technology. Having always been fascinated by the latest innovations, he uses his platform as a journalist to explore how new technologies work, why they matter and how they might shape our future.
See more