Published on

Whose Bias Do You Want?


Bing Image Creator "It's 2050 and the earth has been knocked towards the moon by a flying spaceship called "Google""

This month, the Gemini team at Google unveiled their newest model, called Gemini 1.5 Pro. It boasts competitive performance to GPT-4, and is also multi-modal, meaning that it can process and generate both text and images.

You can check it out https://gemini.google.com/app .

As I mentioned, it’s designed to be able to generate images. However, if you give it a prompt today (Feb 22) asking for an image, you’ll get something like the below:


Turns out, there was a big issue with Gemini image generation: it was extremely biased. Usually we find that LLMs like Llama and image generators like Stable-Diffusion may generate text and images that most closely resemble the data that they’re trained on. Bloomberg has an accessible article that explains this phenomenon https://www.bloomberg.com/graphics/2023-generative-ai-bias/ . A common example is the assumption that a doctor is male and not female: image generation systems proliferate common trends and stereotypes that exist in the world and especially the world that is stored on the internet. However, in the case of Gemini, the pendulum was swung strongly and quite obviously in the opposite direction. Twitter/X is flooded with examples where users were showing a comical bias towards generating an ethnically diverse set of images of people when it wasn't historically accurate:

A few examples:

Yannic Kilcher has an entertaining video here where he shows and discusses several of the images that Gemini creates that are quite comical.

What happened, Technically?

Although the Gemini 1.5 Pro release notes don’t explain the details of the Gemini training, it’s reasonable to assume that it aligns with the general methodology of training and inference for other LLMs like GPT-4 and Llama. The training is a multi-stage approach where the first stage is pre-training on a gigantic amount of data, and then the second stage is some sort of fine-tuning based on preference data (RLHF, DPO, etc). In that second stage, Google assumedly placed some amount of anti-bias preference data that tuned the model to avoid proliferation of world bias/stereotypes, but accidentally did it so much that it actually introduced a bias against historical reality.

The issues with Gemini help to illustrate a still unresolved issue in the open source AI community. Although the large tech companies like Meta and Google have begun to release the weights of their LLMs (Llama, Gemma) for certain tasks, they are still holding back the details of the code and data used to train the model. Gemini is an easy example because the fine-tuned bias towards diversity in inappropriate situations was obvious, but there are many more nuanced situations. It's not too hard to find examples of similar issues that show up in more nuanced ways in chatGPT

We could take this as some event of Google trying to revise white men out of history, but as Gary Marcus tweets here: "This is all just one more illustration of the fact that absolutely nobody actually knows how to make guardrails work reliably."

For example, it seems reasonable to assume that a big tech company training these models would be tempted to insert some amount of preference training data to help prevent the model they train from speaking negatively about their company or their related interests. Maybe that bias is fine if you use the Google trained LLM as a chatbot for a car dealership, but if you're using it as an assistant for making a decision about whether the Google Pixel phone is better than the Apple iPhone, it may or may not give you unbiased opinions. Some of these adjustments aren't easy to detect, and can be even harder to prove when we're not informed about the datasets used for preference alignment. Although research has shown that further fine-tuning can remove preference tuning (such as safety training) from a model, it doesn't completely address the scenario where we aren't sure of what bias was introduced. It's easy to continue fine-tuning to strip out things that we know were added (i.e. LLMs trained to avoid generating racist slurs), but harder to remove things that we don't know about, like small tweaks to change how a company is being represented by the model.

Projects like Bloom and OPT are especially interesting since they release the specific information about the datasets used in training. It will be interesting to see if any companies in 2024 decide to release a truly open source model where they provide not only the model weights, but also the dataset and training code that were used to produce those weights.