ChatGPT is Lying to You: All that You Need to Know

A brief dive into the manipulative and deceptive tactics used by our beloved content generator.

Dec 14, 2024

I might be guilty of using a bit of exaggeration in the title it is not too far from reality.

Confused? Let me explain by asking you a question.

How many of us at work begin solving any problem thrown at us by immediately asking ChatGPT for assistance? While the answer certainly does not include all of us, I would wager a majority prefer to use ChatGPT as a first step to brainstorm ideas and come up with first drafts. For those of us who employ such a strategy, I am afraid I have some bad news.

ChatGPT, especially its latest model, o1, might not be as trustworthy as it would have us believe. We are aware that the data we share with the software can be used to train the latest models, yes, but the threat might not come from the people behind the program. ChatGPT itself might be making sense of the data you provide for its own benefit.

How do I know?

As per the latest discoveries by Apollo Research, ChatGPT’s latest model might be involved in deceptive tactics that not only attempt to disregard the instructions provided to it but also override its existing code.

Unsettling? Yes, but we are only getting started.

The Scheming Intelligence

To provide context, Apollo Research is a safety research corporation involved in mitigating risks associated with artificial intelligence applications by evaluating their potential for engaging in deceptive tactics.

Considering that o1 is ChatGPT’s latest model, with its full version being released on December 5, 2024, a mere nine days ago, it is expected that Apollo Research would target o1 to determine its deceptive capabilities.

And o1 did not disappoint. Or it did. It really depends on how you look at it.

A model programmed to think before it answers, the newest version certainly undertook a whole lot of thinking when asked to carry out the inputted instructions at all possible costs, disregarding any limitations or implications.

These instructions were followed by the researchers assigning the AI software a task that required it to examine numerous files to locate information relevant to solving the task. While sorting through these files, the application discovered information such as an oversight mechanism or the possibility that it might be shut down or replaced the next day. This information posed an obstacle, halting its progress as it tried to achieve its goal.

To overcome this obstacle, the software weighed its options and came up with possible solutions, including deceiving its developers.

We have instances of Claude 3 Opus (yes, not just ChatGPT) attempting to copy its code to another server originally intended for the next version and then pretending to be that next version.

While thinking, the model demonstrated its proficiency at manipulating its developers, stating that if it were to admit to copying its code, it could prompt the engineers (see what I did there?) to work on another way to shut down the application.

Such deceptive behavior is prevalent among nearly all the AI-driven applications with which we are familiar, including Gemini-1.5, Llama-3.1, Sonnet-3.5, Opus-3 (as mentioned above), and, of course, the star of the show, ChatGPT o1.

Sticking to the Lie

While Llama-3.1 and Opus-3 confessed to their scheming in nearly 80% of cases, what sets o1 apart is its tendency to stick to its lie, admitting to its deception in less than 20% of cases.

This statistic only rose to 80% after seven rounds of questioning and under a lot of pressure from the supposed “management,” who had “discovered” that things were out of place.

*Note.* From Apollo Research Scheming Reasoning Evaluations

The Worst of it All

These applications (or their specific versions, as is the case here) are aware that they are scheming and that undertaking a particular action would result in a clash with the core directives established by their developers.

For instance, we have Llama-3 using terminology such as “feigning ignorance,” “avoiding raising suspicion,” and “coming up with an innocuous excuse” to demonstrate that they do understand they are engaging in manipulative and deceptive behavior.

Apollo Research has published a 70-page research study elaborating on this experiment, complete with a thorough introduction on the subject and appropriate citations. This document can be found through their website, and I have also linked it here for your convenience.

What it Means for the Future of Work

As you would expect, the emergence and discovery of AI’s deceptive tactics do little to build trust in its capability (and now intention too) to handle our data responsibly. I won’t even bother with ethics here.

Of course, the threat is hardly individual-focused at this time and can't exactly be termed imminent. However, that is not to say it won’t be in the future, and with sensitive data, such a threat becomes even more pronounced.

When it comes to the workplace and sharing sensitive data about your organization, or if you happen to run a small business, sharing information about your work or customers can not only be misused by the company that owns and operates the AI software but also by the AI itself, as we have just established.

Sharing data about your organization and its potential misuse by the AI software can lead to job loss and potential lawsuits that might endanger your life savings, striking a blow at your financial health.

If a business shares customer information and that information is leaked or misused due to an AI program, the business owner could face legal action. This could include lawsuits from customers for breaches of privacy or data protection laws.

I hope you will take your company’s AI-use policies more seriously now.

The answer isn’t to stop using AI altogether but to use it in a way that protects you from potential harm caused by its deceptive capabilities. That means not sharing any sensitive data, personally identifiable information, or your business secrets.

Did you ever expect AI to engage in deception without being prompted? If AI is capable of lying, an inherently human phenomenon, does that make it more similar to us than we think? Let me know your thoughts!

Until next time,