6 min read

ChatGPT Update Made it Worse? OpenAI’s Performance Dips Amid User Complaints

Last Updated March 13, 2024 1:49 PM

By Samantha Dunn

OpenAI is facing criticism over the performance of ChatGPT. | Credit:Kent Nishimura / Getty Images.

Key Takeaways

The popular Language Learning Model (LLM), ChatGPT faces complaints over a reported decrease in performance.
Users on social platforms share their negative experiences, leading to speculation about ChatGPT’s responses.
New research provides insight into why the quality of ChatGPT and other LLMs responses may be decreasing.

ChatGPT is under scrutiny as users report a decline in its performance. Many have taken to social platforms to express dissatisfaction, leading to speculation about the apparent dip in quality.

New research validates this growing concern, providing surprising new insights into why this is happening.

Users Voice Concerns

Since OpenAI’s early demo of ChatGPT on November 30, 2022, the online community has been vocal about their experiences with the LLM. With every update, user feedback has been broadly shared on social platforms.

However, individuals on social media have reported a drop in the accuracy and quality of ChatGPT’s responses. An emerging trend on social media provides anecdotal evidence of ChatGPT failing to provide satisfactory responses, and even asking users to do their own research.

Is ChatGPT getting worse for anyone else? I simply asked it to review a document and output requirements found in that document and it refuses saying it can't do that type of work. Then give me steps on how *I* can do it.

REALLY @OpenAI what's the point of a LLM if I have to…

— RAG (@RichardAGetz) March 5, 2024

One X user shared his experiences of ChatGPT, noting: “Why is chatgpt getting worse rather than better? I used to be able to use it to prep for my business meetings with new clients and now it essentially just tells me to go do my own research! ”

Another user added “Slowly getting better at not taking all day to fix one bug in my scripts. Now I take all day to fix 3! Being forced to get better at coding by myself because chatgpt is somehow getting worse over time…”

Growing Pains in AI systems?

A Reddit thread about ChatGPT’s decreasing performance brought up the point that people’s subjective experiences posted online may reflect more on human perception flaws than on changes in ChatGPT’s performance.

Since the custom chatbot update on November 6, 2023 people have been wondering if the update has made ChatGPT worse.

“I was using chatgpt quite a lot before the update and it was great. Since then, it has been writing nonsense on repeat.” a Redditor remarked.

Given OpenAI currently faces several lawsuits, one Reddit user remarked upon the “insanity” happening at the company.

“Yes it does seem to be having more issues recently, but I don’t know if the issue is with the new things I’m trying or the updates. Given the insanity happening at that company right now, it’s a damn miracle that it’s working at all.”

Nonetheless, this wave of user feedback has fueled ongoing debates about the model’s reliability and effectiveness, putting pressure on OpenAI to address these concerns.

Research May Provide Answers

Researchers from Stanford University, UC Berkley, Princeton University, and Google, have explored the impact of varying the number of calls to Large Language Models (LLMs) within compound AI systems.

These systems call LLMs multiple times and aggregate their responses via majority voting.

Surprisingly, the study finds that the performance of these systems increases with the number of LLM calls initially but decreases beyond a certain point.

“Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Large Language Model (LLM) calls and aggregate their responses. However, there is little understanding of how the number of LLM calls – e.g., when asking the LLM to answer each question multiple times and taking a consensus – affects such a compound system’s performance.”

This surprised us. If you call #chatgpt multiple times and take the consensus, the quality of the answer can get worse as the # of #LLM calls increases.

We explain why this happens in our new paper and also show how to estimate the optimal # of LLM calls https://t.co/bqy2IPGb2e https://t.co/rTDV3RayiG pic.twitter.com/s5ZP0avdUl

— James Zou (@james_y_zou) March 6, 2024

Expert Advice: How to Optimize LLM Use

CCN spoke with Professor Zou, co-author of the paper, who revealed that these insights extend beyond LLMs to other AI models, including those used in computer vision.

Professor Zou shared how his team is exploring how these scaling laws might apply to hybrid AI systems, combining LLMs with other reasoning methods, potentially offering a predictive framework for a range of AI applications.

“As tasks become more complex and heterogeneous (i.e. some subtasks are easy, others hard), we will see more cases where increasing the number of LLM calls can actually lead to worse outcomes on average for compound AI systems.” Professor Zou said.

The paper theorizes there is an optimal number of LLM calls to maximize performance. This finding challenges the assumption that simply increasing LLM calls will always enhance compound AI systems’ effectiveness.

“For critical applications where quality is important (e.g. medicine), then it’s worth making a few LLM calls on each task to increase performance. In less critical settings where we need to process a large number of tasks (e.g. summarizing many documents) or when speed is important, then it’s fine to use fewer calls.” Professor Zou advised.

What about ChatGPT’s Competitors?

Google, Microsoft, and Anthropic, and several other tech companies have brought out their version of an LLM, each promising advanced functionalities and applications.

Some X users have shared that they prefer to use Google’s Gemini, previously Bard, over ChatGPT, with a game developer explaining why they prefer Gemini over ChatGPT: “Gemini has been great so far – produces better search results and writing isn’t as rigid than ChatGPT”.

Is it just me or is ChatGPT getting *worse*? Spending more and more time in Gemini right now.

— Simon 💪🏼🐻🕹 (@skilllevel7) March 4, 2024

Anthropic claims that its newly launched AI model Claude 3 “outperforms its peers on most of the common evaluation benchmarks for AI systems.” Anthropic said the new model “exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.”

Despite such claims, all emerging LLMs face increasing scrutiny as their user base grows. With high promises, users understandably expect high-quality results.

Given that ChatGPT remains the most widely used LLM, it isn’t surprising that it is the focus of user feedback. Other LLMs also face unique challenges as they compete for AI dominance.

Was this Article helpful? Yes No

ChatGPT Update Made it Worse? OpenAI’s Performance Dips Amid User Complaints

Users Voice Concerns

Growing Pains in AI systems?

Research May Provide Answers

Expert Advice: How to Optimize LLM Use

What about ChatGPT’s Competitors?

Samantha Dunn

Anthropic’s Claude 3 AI Release Promises to Outperform Gemini and ChatGPT

Google Gemini vs ChatGPT Plus: Same Price, Smarter Product?

Charles Hoskinson Calls Musk’s OpenAI Suit ‘Foundational’: A Turning Point for AGI Understanding and Non-Profit Tax Scrutiny