Home / News / Technology / ChatGPT Update Made it Worse? OpenAI’s Performance Dips Amid User Complaints
Technology
6 min read

ChatGPT Update Made it Worse? OpenAI’s Performance Dips Amid User Complaints

Published
Samantha Dunn
Published

Key Takeaways

  • The popular Language Learning Model (LLM), ChatGPT faces complaints over a reported decrease in performance.
  • Users on social platforms share their negative experiences, leading to speculation about ChatGPT’s responses.
  • New research provides insight into why the quality of ChatGPT and other LLMs responses may be decreasing.

ChatGPT is under scrutiny as users report a decline in its performance. Many have taken to social platforms to express dissatisfaction, leading to speculation about the apparent dip in quality.

New research validates this growing concern, providing surprising new insights into why this is happening.

Users Voice Concerns

Since OpenAI’s early demo of ChatGPT on November 30, 2022, the online community has been vocal about their experiences with the LLM. With every update, user feedback has been broadly shared on social platforms.

However, individuals on social media have reported a drop in the accuracy and quality of ChatGPT’s responses. An emerging trend on social media provides anecdotal evidence of ChatGPT failing to provide satisfactory responses, and even asking users to do their own research.

One X  user shared his experiences of ChatGPT, noting: “Why is chatgpt getting worse rather than better? I used to be able to use it to prep for my business meetings with new clients and now it essentially just tells me to go do my own research! ”

Another user added  “Slowly getting better at not taking all day to fix one bug in my scripts. Now I take all day to fix 3! Being forced to get better at coding by myself because chatgpt is somehow getting worse over time…”

Growing Pains in AI systems?

A Reddit thread  about ChatGPT’s decreasing performance brought up the point that people’s subjective experiences posted online may reflect more on human perception flaws than on changes in ChatGPT’s performance.

Since the custom chatbot update  on November 6, 2023 people have been wondering if the update has made ChatGPT worse.

“I was using chatgpt quite a lot before the update and it was great. Since then, it has been writing nonsense on repeat.” a Redditor remarked.

Given OpenAI currently faces several lawsuits, one Reddit user remarked upon the “insanity” happening at the company.

“Yes it does seem to be having more issues recently, but I don’t know if the issue is with the new things I’m trying or the updates. Given the insanity happening at that company right now, it’s a damn miracle that it’s working at all.”

Nonetheless, this wave of user feedback has fueled ongoing debates about the model’s reliability and effectiveness, putting pressure on OpenAI to address these concerns.

Research May Provide Answers

Researchers from Stanford University, UC Berkley, Princeton University, and Google, have explored the impact of varying the number of calls to Large Language Models (LLMs) within compound AI systems.

These systems call LLMs multiple times and aggregate their responses via majority voting.

Surprisingly, the study  finds that the performance of these systems increases with the number of LLM calls initially but decreases beyond a certain point.

“Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Large Language Model (LLM) calls and aggregate their responses. However, there is little understanding of how the number of LLM calls – e.g., when asking the LLM to answer each question multiple times and taking a consensus – affects such a compound system’s performance.”

Expert Advice: How to Optimize LLM Use

CCN spoke with Professor Zou, co-author of the paper, who revealed that these insights extend beyond LLMs to other AI models, including those used in computer vision.

Professor Zou shared how his team is exploring how these scaling laws might apply to hybrid AI systems, combining LLMs with other reasoning methods, potentially offering a predictive framework for a range of AI applications.

“As tasks become more complex and heterogeneous (i.e. some subtasks are easy, others hard), we will see more cases where increasing the number of LLM calls can actually lead to worse outcomes on average for compound AI systems.” Professor Zou said.

The paper theorizes there is an optimal number of LLM calls to maximize performance. This finding challenges the assumption that simply increasing LLM calls will always enhance compound AI systems’ effectiveness.

“For critical applications where quality is important (e.g. medicine), then it’s worth making a few LLM calls on each task to increase performance. In less critical settings where we need to process a large number of tasks (e.g. summarizing many documents) or when speed is important, then it’s fine to use fewer calls.” Professor Zou advised.

What about ChatGPT’s Competitors?

Google, Microsoft, and Anthropic, and several other tech companies have brought out their version of an LLM, each promising advanced functionalities and applications.

Some X users have shared that they prefer to use Google’s Gemini, previously Bard, over ChatGPT, with a game developer  explaining why they prefer Gemini over ChatGPT: “Gemini has been great so far – produces better search results and writing isn’t as rigid than ChatGPT”.

Anthropic claims that its newly launched AI model Claude 3 “outperforms its peers on most of the common evaluation benchmarks for AI systems.” Anthropic said the new model “exhibits near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence.”

Despite such claims, all emerging LLMs face increasing scrutiny as their user base grows. With high promises, users understandably expect high-quality results.

Given that ChatGPT remains the most widely used LLM, it isn’t surprising that it is the focus of user feedback. Other LLMs also face unique challenges as they compete for AI dominance.

Was this Article helpful? Yes No