OpenAI is rolling out a major update for its voice mode, which promises to transform the AI chatbot into something more of a virtual assistant.
Of course, this latest addition to OpenAI’s suite of extremely capable AI technologies has raised several questions about deepfakes, privacy, copyright, security, and general concerns around digital safety.
OpenAI’s most powerful chatbot, ChatGPT-4o, is getting a more advanced “Voice Mode” update which it is now rolling out for a limited group of paid users.
First developed in late 2022, OpenAI began testing what it dubbed ‘Voice Engine’ at the time, with a “small group” of trusted users in late 2023. Highlighting the powerful its powerful capabilities in a March 2024 blog post, OpenAI explains that Voice Engine only needs:
“[…]a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.”
Over recent months, OpenAI tested the model’s capabilities to find weak spots in the tech. This involved a trial of more than 100 people who spoke 45 different languages across 29 geographies. The rollout is limited to certain paid members, but it is planned to see a full launch after the tool’s safety has been properly trialed.
Aware that there would be some concerns, OpenAI has said that advanced voice mode only speaks in four present voices, and is built to block outputs that are in any way different, It has also set up measures to block violent or copyrighted content requests.
Additionally, in a June 25 post to X, OpenAI explained that the limited rollout of its advanced Voice Mode would still need an additional month before launching wholesale. In its reasoning, OpenAI explains that it was working to improve its abilities to “detect and refuse certain content”, amongst other things.
The firm maintained that it has a “high safety and reliability bar”, and the timeline of release will be determined by whether or not the new tech can meet those standards.
OpenAI had previously attempted to roll out its new voice model, Sky, in May 2024, but rolled that back in a timely fashion, just before they faced backlash from Scarlet Johansson. The actress claimed that she had declined an offer from Sam Altman to provide the voice for the bot. In addition, she alleged that OpenAI instead chose to mimic her voice for Sky, and release it anyway.
It seems as though AI has a bad habit of causing ethical and moral dilemmas at every turn it takes. Though most of all, AI’s ability to convey authenticity through deepfakes and voice replication seems to be one of the primary concerns.
Considering the real-world ramifications that spring up from misinformation or miscommunication through text alone, AI’s impersonating humans poses a far greater threat. Of course, this will see new fraudulent schemes and scams arise or even significant disruptions to vital infrastructure.
Even OpenAI admits that its new voice tool could have serious risks , “which are especially top of mind in an election year.” That said, they do seem committed to building it responsibly and claim to be working with partners of all shapes and sizes.
“We are engaging with U.S. and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build. “
Trust in the digital space is fragile . The Internet is rife with phishing attacks, privacy breaches, data leaks, spam emails, and sketchy calls from unknown numbers via WhatsApp to name a few. If AI continues to advance, unchecked, without proper regulations, things will get quite messy very fast.
OpenAI’s rollout of new voice capabilities for GPT-4o is, arguably, just another step toward yet an ethical quandary that we may be unable to pull back from.