Openai’s research on AI models that deliberately lie are wild

Openai's research on AI models that deliberately lie are wild

Occasionally, researchers at the largest tech companies fall a bomb shell. There was the time Google said its latest quantum chip indicated that there are more universes. Or when Anthropic gave his AI Agent Claudius a snack -selling machine to run, and it went crazy, called security of people and insisted it was human.

This week it was Openai’s turn to raise our collective eyebrows.

Openai released some studies on Monday explaining how it prevents AI models from “Planning.” It is a practice where an “AI behaves a way on the surface while hiding its true goals,” Die Openai defined in his tweet about the research.

In the paper carried out with Apollo research, scientists went a little further and compared AI to plan a human stock broker who broke the law to make as much money as possible. However, the researchers claimed that most AI “planning” was not so harmful. “The most common failures involve simple forms of deception – for example, pretending to have completed a task without actually doing so,” they wrote.

The paper was mostly published to show that “predominant adaptation” -The anti-sceming technique, they were working well.

But it also explained that AI developers have not figured out a way of training their models not to plan. This is because such a training could actually teach the model how to plan even better to avoid being discovered.

“A great failure to try to” train “planning simply teaches the model to plan more carefully and hidden,” the researchers wrote.

TechCrunch -event

San Francisco
|
27-29. October 2025

The most astonishing part is perhaps that if a model understands it is being tested, it may pretend that it is not planned to just pass the test, even if it is still planned. “Models often become more aware that they are evaluated. This situation consciousness can in itself reduce planning, regardless of genuine adaptation,” the researchers wrote.

It’s not news that AI models will lie. At present, most of us have experienced AI hallucinations, or the model with confidence provides an answer to a prompt that is simply not true. But hallucinations basically present guesses with self -confidence, as Openai research released earlier this month documented.

Planning is something else. It’s conscious.

Even this revelation – that a model will deliberately mislead people – is not new. Apollo Research first published a paper in December documenting how five models were planned when they were given instructions to achieve a goal “at all costs.”

The news here is actually good news: The researchers so significant reductions in planning by using “predominantly adaptation⁠.” This technique involves teaching the model an “anti-sceming specification” and then causing the model to review it before it acts. It’s a bit like getting young children to repeat the rules before they let them play.

Openai researchers insist that the lie they have caught with their own models, or even with chatgpt, is not so serious. As Openai’s co -founder Wojciech Zaremba told TechCrunch’s Maxwell Zeff about this research: “This work has been done in the simulated environments, and we think it represents future use cases. However, today we have not seen this kind of consequence in our production traffic. Nevertheless, it is well known that there is a good thing that there are forms of the reception in chat. Website, and it can tell you, ‘yes,’ I, I have a dous a chat in chat. WAR A DECEPLE WHO D DO AY WHOUSLE GETS A FANTASTIC JOB. There are some small forms of deception that we still need to tackle. “

The fact that AI models from multiple players intentionally deceive people may be understandable. They were built by people, to emulate people and (synthetic data aside) mostly trained on data produced by humans.

So are Bonkers.

While we have all experienced the frustration of poorly executive technology (thinking of you, home printers yesterday), when was the last time your nonAi software deliberately lied to you? Have your inbox ever manufactured E emails on your own? Has your CMS logged new prospects that didn’t exist to cushion the number? Has your fintech app put together its own banking transactions?

It is worth considering this as a business Tønder against an AI future where companies believe that agents can be treated as independent employees. The researchers in this paper have the same warning.

“As AIS is assigned more complex tasks with consequences in the real world and begins to pursue more ambiguous, long-term goals, we expect that the potential for harmful planning will grow-so our protection measures and our ability to carefully test the same,” they wrote.