Unveiling xAI’s latest chatbot, Grok 3, on Monday, Feb. 17, Elon Musk boasted of the model’s high benchmark scores and its ascendency to pole position on Chatbot Arena.
However, he watered down his previous statements that Grok 3 outperforms all other AIs on the market.
That claim is disputed by an xAI engineer, who said he was recently forced out for ranking Grok behind other leading AI models on social media.
In an X live feed on Monday, Musk was joined by xAI’s top engineers to discuss the latest iteration of their AI model.
Musk described the new AI as being “an order of magnitude more capable than Grok 2” and showed off its superior benchmark performance compared to some of its peers.
However, his statement at the Dubai World Governments Summit on Thursday, Feb. 13, that Grok 3 outperforms “anything that’s been released” must be taken with a grain of salt.
Grok 3’s standout achievement is its high score on Chatbot Arena . The model currently holds the top spot on the AI leaderboard, which ranks AI based on blind user tests.
But when it comes to other benchmarks used to evaluate AI performance, xAI’s model falls behind OpenAI’s o1, and the currently unreleased o3.
For example, on the American Invitational Mathematics Examination (AIME), Grok 3 reportedly scored 54%, above GPT-4o but below o1 and DeepSeek R1, which scored 83.3% and 79.8% respectively.
OpenAI claims its unreleased o3 model performs even better with 96.7% accuracy on the AIME evaluation.
While benchmarks make AI comparison possible, static performance metrics are at odds with increasingly dynamic AI development standards. Unlike its predecessors, Musk said xAI will be “continuously improving” Grok 3 going forward.
In an X post on Feb.8, Benjamin De Kraker, who worked on the human data team for Grok development, discussed his opinion on the top AI models for code.
In De Kraker’s view, OpenAI’s o1-pro, o1 and o3-mini are all tied for the top spot. He placed Grok 3 in fourth position, followed by DeepSeek R1 and GPT4o.
On Thursday, Feb. 12, De Kraker issued a follow-up post in which he claimed, “xAI told me I either had to delete [the post] or face being fired.”
Facing such an ultimatum, De Kraker said he decided to retain his “free speech and dignity” and resign.
“It’s very disappointing to me that a company and leaders who supposedly champion free speech and openness would try to fire a low-level employee over a clearly labeled opinion that contains absolutely nothing controversial,” he added.
De Kraker’s predicament hasn’t gone unnoticed by xAI’s top brass. Commenting on the most recent post, CEO Musk said, “That’s weird.” However, he didn’t indicate whether he planned to intervene or not.