Key Takeaways
Increasingly sophisticated voice-enabled chatbots are rapidly emerging as a game changer for the customer service industry, with many companies embracing AI as a more efficient way to deal with customer requests and inquiries.
With chatbots already doing jobs traditionally performed by human workers, the prospect of AI with truly human-like intelligence has come so tantalizingly close that some people think it could be achieved before the end of the decade.
According to Elon Musk, the breakthrough will come in 2025 or 2026. Rather more vaguely, Sam Altman predicted that human-level AI would land in the “reasonably close-ish future.” More recently, former OpenAI board member Helen Toner told a U.S. Senate hearing that artificial general intelligence (AGI) “could be as close as one to three years away.”
In testimony to the Senate Judiciary Committee hearing on AI oversight, Toner said that there is a “disconnect” between how industry insiders view AGI and how it is perceived by the general public.
“Talk of human-level AI is often treated as either science fiction or marketing hype,” she said. “But many top AI companies, including OpenAI, Google and Anthropic, are treating building AGI as an entirely serious goal.”
Moreover, she said many people inside those companies think they might reach that goal within 10 or 20 years. Some, she added, “believe [it] could be as close as one to three years away.”
Toner went on to criticize the AI industry’s calls for regulatory restraint, arguing that “a wait-and-see approach to policy is not an option.”
While Toner’s discussion of AGI operated in largely abstract terms—“smarter than,” “as capable as,” etc.—there are AI advancements that have already achieved uncanny approximations of human qualities.
Whether via phone or online messaging, traditional automated customer service systems are limited to the point that they can often be frustrating. But a new generation of conversational chatbots powered by large language models is changing that.
To help build smarter, more useful AI agents, OpenAI recently teamed up with T-Mobile to build IntentCX, a new bot that integrates the mobile carrier’s wealth of customer data.
Meanwhile, PolyAI recently partnered with OpenTable to let restaurants take reservations using voice-enabled AI.
Incidentally, OpenTable was discussed as an example by entrepreneur and investor David Sacks in a recent conversation about human-like AI.
Describing an OpenAI product roadmap during the All-In Summit 2024 in Los Angeles on Sept. 16, Sacks suggested the plans to extend its models to a greater range of applications.
The roadmap Sacks alluded to touted the “PhD-level” reasoning prowess of OpenAI’s latest models, dubbed o1.
With o1’s aptitude for complex reasoning, he said the new generation of AI will “have the ability to use tools” that go far beyond today’s software integrations.
Sacks continued, “This ability means agents will soon be able to interact with all kinds of websites and software systems, enabling them to carry out more complex tasks like making a restaurant reservation. Where things get “really crazy,” he added, “ is when the phone gets picked up on the other end, and that could be an AI, too.”
However, although voice AI has many benefits, there are also concerns about its potential misuse.
When it released its first voice-enabled LLM GPT-4o in May 2024, OpenAI acknowledged that the new audio modalities “present a variety of novel risks.”
This could explain why the firm has held off on opening up GPT-4o’s voice function to developers. Existing APIs only grant access to the models’ text and vision capabilities. However, Sacks said a voice API is currently being tested in beta mode.
The dangers of voice AI include unauthorized impersonation. But OpenAI has been accused of emulating Scarlett Johansson’s voice without her permission.
After she turned down an offer from OpenAI to be the voice of its new chatbot, Johansson said she was “shocked, angered and in disbelief that [CEO] Mr. Altman would pursue a voice that sounded so eerily similar to mine.”
Vocally equipped AI agents also pose a significant fraud risk, enabling cybercriminals to impersonate real people for their own nefarious purposes.
When Alan Turing proposed a test in 1950 to determine a machine’s ability to exhibit behavior that is indistinguishable from that of a human, he said the conversation should be limited to a text-only channel so the result wouldn’t depend on AI’s ability to speak. But as Sacks observed, “we are already at the point” where voice-enabled chatbots can convincingly replicate human customer service agents.
More than 50 years later, it feels like ChatGPT and its peers are on the cusp of passing the Turing test. However, the final hurdles to truly human-like intelligence may prove the most difficult to surmount.
Some of the barriers to AI convincingly passing as a human appear to be designed into frontier AI models by their developers. Try as you might, ChatGPT won’t pretend to be human, and attempts to get it to engage in Turing’s experiment invariably fail.
Given the risks of AI impersonation, hard-coding models with a commitment to transparency is probably a good idea. However, bad actors will always seek to bypass such built-in safety rails.
In an interview with CCN, Ken Joseph, the head of U.S. regulatory consulting at Kroll, said the rise of convincing deepfake audio and video means businesses need to completely reimagine their security protocols. “As the threat becomes more sophisticated, so too must the controls and verification systems that firms put in place,” he observed.
Of course, the same advice also applies to everyday consumers and platform users. In the age of AI, previous standards of realness (a face on camera, a voice at the end of the line) no longer apply. Turing originally worried that robotic voices would give AI away, but today, they could be the first aspect of the AI stack that can convincingly pass his test.