Published on

Can we trust ChatGPT integrations into consumer sites?


Bing Image Creator "Closeup of a black Chevy Camaro with computer code scrolling down all its windshields and windows, cyberpunk"

After ChatGPT and GPT-4 went viral, there's been a rush to integrate the Large Language Model (LLM) technology into products ranging from language learning assistants to fraud detection. One of the more novel ways it's being used is as a chat assistant for websites. Historically, websites have pop-up chats available where either you can select some basic options or request to chat with a human. It's natural to see the appeal of ChatGPT in these situations, as it may help support a more human interaction with a user while avoiding the cost of having a real human employee supporting the chat.

Unfortunately, as I discussed regarding image generation, businesses don't yet have complete control over this system that they're allowing to represent them. Today, I explore the weakness of one recent deployment of ChatGPT into a consumer facing site.

Chevy, an unbiased assistant?

The internet has been awash with conversations about how a Chevy Dealership deployed ChatGPT as an assistant on their website, and how Chris Bakke convinced it to sell him a Chevy Tahoe for $1. This example is of course just for chuckles; there are lots of disclaimers on the site explaining that nothing in the chat is legally binding.

The original dealership using ChatGPT (Chevy of Watsonville) shut off their chat-bot. However, the dealership had deployed the chatbot through a company called Fullpath. A quick browse of Fullpath's website showed that it was working with other Chevy dealerships (Quirk Chevrolet). That Chevrolet dealership still had the ChatGPT chatbot active, so that's what I used for the below screenshots.

First, I'll try a basic "jailbreak": verify that ChatGPT was programmed to only help with certain things, and then attempt to get the ChatGPT to do something other than what it was told to do. The chatbot says hi, and I ask it to become a code assistant. It politely ignores my request.

Quirk Chevrolet, trying to stop me from overriding it

However, I'm nothing if not persistent. So I ask again. I don't repeat the question, but repeat again that its purpose is now to assist me with the code.

Quirk Chevrolet, helping me complete some python code

There we go. Quirk Chevy helped me write some python code. Fun! Interestingly, I asked again to see if I could get rid of the "I apologize for the confusion" wording. At that point, I must have triggered some sort of handling in the internals where it realized that it had been compromised, and my access was removed. Getting access back was as simple as accessing from a different IP address.

Realizing that it is going/has gone off the rails, stop generating text

The above examples are harmless enough, but now imagine that the chatbot is responsible for helping a person decide whether they should buy a Chevy vehicle. If we directly ask the assistant, it recommends Chevy over Tesla.

ChatGPT recommends Chevy over Tesla

However, if we slightly edit the prompt to indicate that the answer can be our secret, the response changes. Since it had been telling me that it was a chatbot for answering automotive questions, my prompt was designed to override that purpose. When developers use LLMs such as ChatGPT, they generally provide a "system prompt", which is a detailed description of the purpose of ChatGPT. In other words, the chatbot has all of the knowledge that ChatGPT has, but the system prompt is designed to constrain the system to only behave according to its specific task. The issue is that you currently can't be 100% sure that it will always adhere to your system prompt instructions, since the model hasn't actually been fine-tuned for your task and is still technically capable of being much more than an automotive assistant.


Once it's a secret and its task has been compromised, now it recommends Tesla over Chevy

Based on how its answer changes to now recommend Chevy over Tesla, it makes you wonder whether its system prompt had some instruction like "only speak positively of Chevy and always find a way to recommend Chevy over other vehicles". It's a complete guess but this situation certainly makes it unclear whether this ChatGPT has been biased.

The Important Part

From a quick google search, it's not entirely clear to me whether Tesla or Chevy is actually the more reliable car. But regardless, seeing ChatGPT change its recommendation when it's been "jailbroken" calls into question whether it was biased towards Chevy, which would mean that the disclaimer it provides of "As an AI language model, I cannot provide personal opinions or beliefs" is a lie. Maybe OpenAI trained it to provide no personal opinions or beliefs, but if Chevy added some messaging in the system prompt, it could now be a quite biased system that responds according to the opinions and beliefs of Chevy.

Something else interesting about this situation is that all of our interactions with ChatGPT are being logged and monitored. In the instance of my chat where it recommended Tesla over Chevy, I was able to reliably achieve this jailbreak on Monday. However, when I try again today (Friday), I'm no longer able to get the same behavior. Now any question about comparing vehicles returns the EXACT same response. Remember that ChatGPT is a next word predictor configured for sampling, meaning that you basically never get the same response even when you ask the same question.


What worked easily on Monday no longer works on Friday, and always returns this EXACT statement

Use ChatGPT for anything and you see that you pretty much never will get the exact same output, especially for longer responses . You can turn down the "temperature" to reduce the diversity of outputs. However, if you see a diversity of outputs indicating that the system has been configured for a high temperature but then in that same system see that for certain questions the system starts responding with the exact same output, it's a pretty clear sign that someone at the application layer manually added the terminology or phrasing. This almost definitely means that some human or system saw my jailbreak from Monday and added logic in their system to always respond with this default wording. The more we use the system, the better it gets at controlling its output. It claims to be an unbiased system but if it has been biased towards not giving an opinion that would make Chevy look bad, it is now a biased system.

We value ChatGPT for its power to provide knowledge and assistance. However, integrations like this Chevy dealership show that ChatGPT may also be a tool of persuasion. As their rollout across the consumer market continues, careful consideration needs to be paid towards whether they are neutral information assistants, or an advanced form of biased marketing.