Recently Buzzfeed published an article featuring AI-generated images of Barbie dolls from different Countries. The images were generated using the famous generative AI model — MidJourney. The images created a huge backlash on X (formerly Twitter) as the majority of these images depicted massive biases and stereotypes. For instance, a South Sudanese Barbie was holding a gun and a German Barbie was reminiscent of a Nazi soldier. The article seems to be removed from Buzzfeed since then.
Algorithm bias is very real and poses an even bigger threat than human bias. We tend to ‘legitimize’ or ‘validate’ the results and our thinking with technology. The ‘unconscious’ bias that algorithm carry can remain undetected and work in the background to create a systemic unfair advantage or disadvantage for a specific group of people. The worst part is how quickly it can multiply and scale as more and more data with biased results is consumed by the algorithm.
In a recent study by Researchers at AI Start-up Hugging Face and Leipzig University generated 96,000 images using Stable Diffusion models. The results have been pretty disturbing, if not unsurprising. For instance, when tasked with crafting images portraying CEOs or Directors, the models predominantly generated images of white males. They have since then made this study available to be used by the public in the form of an interactive tool here.
In the past, Amazon had to shut down its AI-based recruiting tool as it was showing bias against women candidates. Similarly, in 2016 Microsoft launched a Twitter Chatbot named Tay for casual and playful interaction with users. However, within 24 hours, it ‘learned’ from public comments and interactions and started tweeting racist and antisemitic comments. It had to be shut down!
With a vast population being exposed to AI today, in one form or the other, it is critical as part of our Digitial literacy to understand the Algorithm and Data Bias. This ensures we are not taking the results generated by the likes of ChatGPT at its face value and doing sufficient human-conscious and contextualised fact-checking.
Algorithm Bias
Kate Crawford, a Distinguished Research Professor at New York University explains AIgorithm bias in terms of Allocative and Representative Bias. Allocation Bias occurs when the algorithm unfairly allocates an opportunity to a specific group e.g. Amazon’s recruitment algorithms favouring Men over Women. On the other hand, Representative Bias happens when the algorithm stereotypes or ‘represents’ a certain group with a specific trait. For instance, COMPAS, a widely used algorithm in U.S. criminal justice system is found to be representationally biased towards black defendants. It is far more likely to flag a black defendant as having a higher risk of recidivism than a white defendant.
Data Bias
Another aspect of bias in AI models is caused by Data. For simplicity’s sake, I will broadly categorize this into two types; Source and Diversity.
LLMs use a vast amount of previously human-generated data as input for training. The ‘source’ and content of the data these models are consuming are mostly unknown and in most cases proving to be detrimental to the neutrality of AI algorithms. Take the example of MidJourney. The company is under heavy criticism for massively infringing copyrights as it is training its models on a huge data set of previously human-generated images. The source and biases associated with these images are unknown and how it will systematically include or exclude certain groups of people unfairly is also a subject of debate at this stage.
Similarly, if the algorithm is being trained on a set of data that is not representative of the population, it creates tendencies and results which again are skewed towards a specific point of view. The earlier example of Tay by Microsoft is a case in point. The data is not Diverse enough to provide us with representative results.
Aftermath
Stanford HAI recently reported based on The AI, Algorithmic, and Automation Incidents and Controversies (AIAAIC) database that over the past 10 years (2021-2012), the number of ethical issues associated with AI has increased by 26 times i.e. from 10 cases to 260 cases. This increase in ethical concerns is a direct result of a lack of responsibility in AI algorithm development and consciously or unconsciously introducing biases in the AI models.
A recent study by Bloomberg used Stable Diffusion, a text-image AI model, to generate 5,100 images of people as a representation of workers for 14 jobs. The analysis suggested that the images generated for people working in high-paying jobs (Lawyers, Engineers etc.) were mostly of lighter skin tones, while images of people with darker skin tones were predominantly generated for prompts like “Janitors” and “Dishwashers” etc.
This is far from reality. The same report compares how many ‘Women’ images were generated for the keyword “judge”. It was only 3%, while in reality, 34% of US judges are women! How this will change the worldview we hold today…you guessed it right…is yet to be known!
What is next?
A report by Gartner indicates that by 2025, 30% of marketing from large organizations will be through AI-generated content. Another report by Bloomberg suggests that the Generative AI market is set to reach $ 1.3 trillion by 2032.
This puts a huge responsibility on large organizations, Governments and Academia to join hands to fight for ethical and responsible AI practices. A global standard providing Data Lifecycle frameworks to validate data sources and their authentication needs to be introduced. Blockchain can play a major role in this data governance and modelling by providing an immutable, distributed and trusted network to validate content generated or consumed by AI models.
As for the general public, as I always advocate, improving Digital Literacy is the only way to make informed choices with the use of AI. The core definition of Digital Literacy is no longer how you can use a computer, but extends to how we interpret, understand and connect the results generated by AI algorithms. With the power to reshape perspectives, AI holds the key to forging a brighter future – provided we remain vigilant and aware of its limitations.