Home / News / Technology / Microsoft, OpenAI, Twitter’s Simplistic AI Safety Measures: Are Filters Enough to Stop Online Abuse?
AI
4 min read

Microsoft, OpenAI, Twitter’s Simplistic AI Safety Measures: Are Filters Enough to Stop Online Abuse?

Last Updated March 12, 2024 11:47 AM
Samantha Dunn
Last Updated March 12, 2024 11:47 AM

Key Takeaways

  • AI Safety filters are necessary to protect against the production of harmful content.
  • Nonetheless, tech companies are being forced to re-evaluate the effectiveness of their current safety measures.
  • Recent incidents involving deepfakes and rogue chatbots have prompted concerns over the potential harm caused by AI.

Incidents of online abuse and harmful content generated by Large Language Models (LLMs) have spotlighted the urgency for robust AI safety measures.

Microsoft has reportedly revised the safety protocols of its AI tool, Copilot, following an AI engineer’s alarming revelations to the Federal Trade Commission.

Microsoft Copilot Backlash Prompts Blocks

According to a CNBC investigation , Microsoft has made some changes to its AI filters after a staff AI engineer wrote to the Federal Trade Commission expressing his concerns about Copilot’s image-generation AI.

Microsoft has since adjusted its AI tool, Copilot, to enhance safety. The system now rejects specific requests, including those to generate images of children or teens in violent scenarios.

Concurrently, Copilot users have also cited times when the chatbot responded in alarming ways to their prompts.

 

The new changes block certain prompts like “pro-choice,” and “four twenty”—alongside the term “pro-life.”.

Users now face a clear warning: repeated policy breaches could lead to suspension, a precaution that was absent until recently.

Existing Safety Filters

OpenAI  states on its website that AI tools “come with real risks”, adding “We cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it.

The code of conduct  for Azure’s OpenAI Service is outlined on Microsoft’s website:

“We prohibit the use of our service for processing content or generating content that can inflict harm on individuals or society. Our content policies are intended to improve the safety of our platform.”

These safety filters aimed at stopping harmful content include protected material for text and code.

Azure OpenAI Harm Categories.
Neural multi-class classification models aimed at detecting and filtering harmful content. Source: AzureOpenAI.

Are These Measures Enough?

A recent series of deepfake scandals brings attention to the effectiveness of social media moderation. These deepfakes, designed to manipulate voters in the run-up to elections, have prompted tech companies and social media platforms to outline their stance on AI safety.

Despite these efforts by tech companies, critics question whether such “blunt” actions—merely blocking terms—are sufficient.

CCN reached out to Microsoft who provided information regarding the steps that it is taking to combat abusive AI content.

Microsoft’s vice chair and president, Brad Smith, shared a blog post in February 2024, titled “Combating abusive AI-generated content: a comprehensive approach”.

Smith outlined six focus areas that Microsoft is focusing on, but acknowledged “it’s clear that we will need to continue to innovate in these spaces as technology evolves.”

While Microsoft and OpenAI continue to refine their safety measures, the onslaught of AI-related misuse appears to be a tide impossible to stop.

X has outlined its stance in the context of the 2024 US elections, which has already seen interference from AI-related content.

Beyond outlining safety guidelines, proactive measures are needed to safeguard online platforms from turning into unsafe spaces.

Given how social media facilitates the rapid spread of AI images and videos, the responsibility would appear to lie with both the creators of AI tools, as well as the platforms where harmful AI content is often spread.

Was this Article helpful? Yes No