Key Takeaways
Companies building chatbots typically program them to deliver clear, truthful and safe responses. But for Chinese technology firms, chatbot design entails another challenge: navigating the country’s strict media censorship.
Tasked with overseeing emerging AI services, the Chinese internet regulator has required Large Language Models (LLMs) to undergo government review, forcing Big Tech firms and AI startups alike to submit their models for testing against a strict compliance regime.
For over two decades, the Great Firewall of China has stood as a formidable digital barrier, shaping the way Chinese citizens access the internet. Officially known as the Golden Shield Project, it was launched in 1998 by the Chinese government with the aim of monitoring and censoring information online, for example, by blocking access to foreign websites and restricting sensitive keywords.
As digital media has evolved, the Chinese state has adapted its censorship regime to accommodate new technologies.
Some of the tactics being used to control the flow of information via AI chatbots are familiar from the established Great Firewall toolkit. For example, censoring politically sensitive prompts and cleaning training data for potentially subversive content.
Reports suggest the Cyberspace Administration of China (CAC) is enforcing a strict auditing process that has chatbot developers waiting months and adjusting their models multiple times before being given the all-clear to release them for use.
In a bid to lead by example, the government has even produced its own LLM trained on Xi Jinping Thought , a doctrine centered on “socialism with Chinese characteristics.”
As in the West, the Chinese AI sector is characterized by an interplay between Big Tech giants and smaller startups.
Big Tech chatbots include Ali Baba’s Tongyi Qianwen, Baidu’s Ernie Bot, ByteDance’s Doubao and Tencent’s Hunyuan. Smaller startups, including Moonshot and 01.AI, have emerged as China’s answer to Anthropic or Mistral—innovative AI Labs that have rapidly ascended the Chinese technology ladder, amassing billions of dollars of investment along the way.
Although some observers have argued that the CAC’s latest AI guidelines are more relaxed than they were previously, they still give Beijing plenty of scope to censor chatbot services and pose an operational challenge for AI developers.
Building an AI model that promotes “core socialist values” and doesn’t incite “subversion of state power” is one thing when it is trained on a corpus of Xi Jinping Thought. But enforcing such stringent requirements when training datasets are drawn from a wide array of English language sources is more difficult.
Given the vast amounts of data needed to train LLMs, there simply isn’t enough Mandarin material to build a native Chinese model capable of powering a functional chatbot.
While the ultimate goal of China’s AI developers is to build models that are proficient in conversational Mandarin, they still rely on English language training data, which inevitably contains a Western ideological slant.
To mitigate the impact of predominantly English training data, AI developers have sought to filter Chinese chatbot responses using classifier models.
In the field of machine learning, a classifier refers to an algorithm that automatically scans and categorizes data, for example, a spam filter sorts emails into junk and legitimate mail.
American AI firms use safety classifiers to scan chatbot inputs and outputs for harmful or inappropriate content based on Western notions of harm. In a similar way, Chinese AI developers use them to ensure their agents toe the Communist party line.