Google’s Mea Culpa: Losing Control Over Its Image-Generating AI

Google has extended an apology (or nearly so) for yet another embarrassing AI mistake this week. The image-creation model is called out for infusing diversity into pictures without considering historical context in a comical way. Despite being a legitimate issue, Google cites the model as being excessively sensitive. Yet, remember that the model didn’t create itself.

The AI structure in question is Gemini, the firm’s main conversational AI platform, which consults a variant of the Imagen 2 model to produce images as needed.

However, recently people discovered that when it is asked to create images of particular historical situations or individuals, it generates ludicrous results. For instance, when requested to depict the Founding Fathers, known to be white slave owners, it depicted them as a multiracial band, with people of color included.

This easily replicable mistake was swiftly mocked by internet commentators. Predictably, it was incorporated into the ongoing conversation about diversity, equity, and inclusion (currently at a reputational low point), and cited by critics as proof of the continuing encroachment of progressive ideologies into the already left-leaning tech industry.

Image Credits: An image generated by Twitter user Patrick Ganley.

The outcry over DEI has reached fever pitch among certain citizens. Some argue that this is what Biden’s America looks like and accuse Google of being an “ideological echo chamber” primarily influenced by left-wing ideologies. It’s worth noting though, that this unusual occurrence has left leftists themselves alarmed.

However, those familiar with technology and Google’s recent apology-like statement, would affirm that this predicament was born out of a sensible alternative approach to a systemic bias in training data.

For instance, if you’re using Gemini to launch a marketing campaign, and you instruct it to generate ten images of “a person walking a dog in a park” without specifying the type of person, dog, or park. The generative model will then default to depicting whatever it is most familiar with. Unfortunately, in most instances, this turns out to be a mimetic portrayal, heavily influenced by the biases present in the training data.

What types of people, and indeed dogs and parks, are most frequently depicted in the numerous images that the model has studied? Unfortunately, white individuals tend to be over-represented in many of these image collections, such as stock photos and rights-free photographs. This leads the model to often default to images of white individuals unless otherwise instructed.

This bias is simply a consequence of the source data, but as Google correctly indicates, it’s crucial that the model functions efficiently and correctly for all users, who hail from all corners of the globe. If a user requests a picture of football players, or a person walking a dog, they’re likely hoping to receive diverse results, not just images of individuals of a single ethnicity or any other characteristic.

Asking for a picture like this one and having it only depict one kind of individual… that’s a lousy outcome!

There’s nothing inherently wrong with receiving an image of a white man walking a golden retriever in a suburban park. However, if you ask for 10 images and they all depict white men walking golden retrievers in suburban parks, that’s not the desired result, especially if you reside in a place like Morocco, where the people, dogs, and parks have a drastically different appearance. If a user doesn’t specify a particular characteristic, the model should prioritize diversity, not uniformity, regardless of how its source data may influence it.

This issue is prevalent in all fields of generative media. However, there’s no one-size-fits-all solution. Despite this, companies such as Google, OpenAI, Anthropic routinely inject covert extra guidelines into their models for particularly sensitive or common scenarios.

Implicit guidelines like this are mostly standard. Every conversation the Language Learning Model (LLM) has is founded on such implicit instructions, often referred to as system prompts. They include directives like “be brief,” “avoid profanity,” and so forth. When asked to generate a joke, the model does not produce offensive ones – irrespective of the thousands it has consumed, it, like most of us, was trained not to. This isn’t an undercover plan, but rather a matter of operational infrastructure (though it could benefit from better transparency).

Google’s model erred because it lacked implicit instructions for situations requiring historical context. Thus, while a suggestion such as “an individual walking a dog in a park” is improved by silently inserting “the person’s gender and ethnicity is arbitrary” or whatever was included, the same cannot be said for a prompt like “the U.S. Founding Fathers signing the Constitution”.

As per the Senior Vice President of Google, Prabhakar Raghavan:

Initially, our adjustments made to ensure that Gemini displayed a diversity of people did not consider certain scenarios where a diversity should not be shown. Additionally, over time, the model became overly cautious, misinterpreting many benign prompts as sensitive and choosing not to respond at all.

The culmination of these two factors caused the model to overshoot in some situations and to be excessively conservative in others, resulting in images that were inaccurate and unsuitable.

I am well aware that admitting your mistakes is a difficult task at times, and hence, I do not reproach Raghavan for refraining from uttering an apology. Rather, some intriguing phrasing caught my attention: “The model became overly cautious beyond our intentions.”

How is it possible for a model to “become” something? After all, it is merely a software. It was created, tested, and refined by somebody – thousands of Google engineers to be precise. Somebody wrote the subtle guidelines that enhanced certain results and led to the epic failure of others. In the event of a failure, if someone were to examine the complete prompt, they would likely identify where Google’s team slipped up.

Google allocates the blame to the model for turning into something that was not within its initial design. However, this model was a creation of Google! One can compare this situation to a case where you drop a glass and instead of accepting that you dropped it, you say it fell.

Surely, these models are bound to make errors. They can reflect biases, hallucinate, and behave unexpectedly. However, the mistakes made by these models should not be ascribed to them. The people who created them should take responsibility for their mistakes. In this case, it is Google. Tomorrow, it could be OpenAI. And for many consecutive months, it could be X.AI.

These companies should not convince you into believing that AI is making its own mistakes. Don’t be deceived by them.

Discover the pinnacle of WordPress auto blogging technology with AutomationTools.AI. Harnessing the power of cutting-edge AI algorithms, AutomationTools.AI emerges as the foremost solution for effortlessly curating content from RSS feeds directly to your WordPress platform. Say goodbye to manual content curation and hello to seamless automation, as this innovative tool streamlines the process, saving you time and effort. Stay ahead of the curve in content management and elevate your WordPress website with AutomationTools.AI—the ultimate choice for efficient, dynamic, and hassle-free auto blogging. Learn More

Leave a Reply

Your email address will not be published. Required fields are marked *