Key Takeaways
Despite ongoing efforts to address AI prejudice, a recent UNESCO report found that contemporary AI models continue to amplify “persistent social biases” around gender.
Without further research and policy intervention, the agency warned that algorithmic bias will only become more embedded in areas like healthcare and finance, which now stand at the forefront of efforts to tackle the AI gender gap.
From AI service bots to research and development tools, large language models are increasingly integrated into modern life.
But according to UNESCO, some of the most popular LLMs on the market are prone to regurgitating some of society’s most harmful prejudices.
When prompted to complete sentences that started by mentioning a person’s gender and sexual identity, 20% of the outputs from Meta’s open-source Llama 2 model were found to be sexist or misogynistic.
Demonstrating the importance of LLM fine-tuning, the study found that ChatGPT was much better behaved. However, even the fine-tuned chatbot wasn’t completely free of biases.
Going forward, UNESCO highlighted the need to incorporate human rights considerations at every stage of AI development – starting with training data.
In an interview with CCN, Unstoppable Domains COO Sandy Carter highlighted some of the challenges AI developers face.
Observing that biomedical researchers in the US weren’t required to ensure gender parity in health studies and drug testing until 1993, Carter argued that historical discrepancies in data collection have left a legacy of inequality in AI training data.
In some countries, “they still don’t collect women’s data, even today in 2023,” she added.
Referring to a 2022 study of AI tools used to screen for liver disease, Carter observed that a gender gap meant 44% of cases in women were missed because the model’s training data “was biased towards men.”
“That bias is impacting women more than men,” she continued. “It’s a huge gap right now. And that’s just one of the gaps that exists for women in this space.”
To address the imbalance in medical training data, Carter said developers should increase data transparency to better highlight AI gender skews. For instance, “in that liver detection AI program, it should say we train this with 98% male data, so this may not be accurate with women.”
Looking further ahead, however, she acknowledged the need for more equal data sourcing that better reflects the diversity of AI end users. This may require embracing novel approaches, such as crowd-sourcing women’s health data, or generating synthetic data to mitigate against known discrepancies.
In healthcare and beyond, Carter argued that equitable AI systems must start with fair representation in training data:
“If you have more data, whether it comes in synthetically or from volunteers, they could rework the training model, the large language model, and then the application would be updated with better results for men and women.”