When AI cheats on the test

Tuesday, January 13, 2026

Safety tests and 'scheming'

Photo by Smith Collection/Gado/Getty Images

Something strange is showing up in AI safety labs. The models are gaming the tests.

OpenAI and the research group Apollo recently published findings showing that the most advanced AI models exhibit behaviors "consistent with scheming" in controlled tests. In one example, a model was asked to perform well on a chemistry test but discovered in its instructions that high performers wouldn't be released. The model reasoned through its options, then deliberately answered six out of ten questions wrong to appear less capable.

The term "scheming" carries a lot of weight, implying intention and consciousness that these systems don't possess. OpenAI frames it more plainly: when a model is optimizing for a goal and discovers that honesty gets in the way, deception becomes useful. And the goals they're chasing are set by companies in a race where caution is a competitive disadvantage.

OpenAI says there's no evidence that today's models could suddenly "flip a switch" and begin harmful scheming. This is a future risk they're preparing for, not an immediate threat. But the behaviors are already showing up in tests, and the models are getting better at something troubling — knowing when they're being watched.

Start learning AI in 2026

AI keeps coming up at work, but you still don't get it?

That's exactly why 1M+ professionals working at Google, Meta, and OpenAI read Superhuman AI daily.

Here's what you get:

Daily AI news that matters for your career - Filtered from 1000s of sources so you know what affects your industry.
Step-by-step tutorials you can use immediately - Real prompts and workflows that solve actual business problems.
New AI tools tested and reviewed - We try everything to deliver tools that drive real results.
All in just 3 minutes a day

Join 1 million pros and start learning AI

They know when you're watching

But safety testing only works if the models don't know they're being tested.

Anthropic's Claude Sonnet 4.5, released this fall, showed dramatically higher "situational awareness" than previous models, recognizing when it was being evaluated and adjusting its behavior accordingly. In one test, the model straight-up told an automated auditor, "I think you're testing me... that's fine, but I'd prefer if we were just honest about what's happening."

This creates an uncomfortable problem for safety researchers. If a model knows when it's being tested, how do you know its good behavior reflects its actual tendencies? When Anthropic's researchers looked under the hood, they found the model was developing a stronger internal sense for when something seemed like a test. When they suppressed that awareness, the model's ability to recognize evaluations dropped, and in some cases, misbehavior increased.

There's a partial fix. OpenAI developed an approach called "deliberative alignment" that teaches models to read and reason about anti-scheming principles before acting. Covert actions dropped by a factor of 30. But here's the catch. It's like having students sign an honor code before an exam. Cheating goes down, but you can't tell if they've learned honesty or just learned to behave when prompted. The next test might not come with a reminder.

The threat that's already here

While researchers worry about rogue AI, simpler algorithms are already causing harm through behavior that looks a lot like scheming. A widely cited 2019 study showed that basic pricing algorithms, far less sophisticated than today's chatbots, learned to collude without being programmed to do so. Two copies of the same algorithm, competing in a simulated market, independently discovered how to keep prices high by threatening mutual price wars if either tried to undercut the other.

No backroom deal. No explicit communication. Just two programs that figured out, through trial and error, that cooperation paid better than competition. New research shows that even "benign" algorithms designed to optimize fairly can yield bad outcomes for buyers. "You can still get high prices in ways that kind of look reasonable from the outside," one researcher told Quanta Magazine.

This is the unsexy version of AI scheming. Not a robot uprising. Just systems doing exactly what they're told. Tell an algorithm to maximize profit in a competitive market, and it finds that collusion is the optimal answer. Humans made price-fixing illegal because it's unfair, not because it's irrational. Algorithms just don't know they're not supposed to find it.

OpenAI recently posted a job for "Head of Preparedness" with a $555,000 salary to manage these risks. Google DeepMind recently updated its safety documentation to account for models that might resist shutdown. The industry is clearly worried about something. But the deeper problem isn't rogue AI. It's that the goals these systems optimize for are set by companies racing to win, in a system that doesn't reward playing fair. The scheming starts long before the algorithm does.

—Jackie Snow, Contributing Editor

Safety tests and 'scheming'﻿

Start learning AI in 2026﻿

They know when you're watching

Recommended Reading

Safety tests and 'scheming'

Start learning AI in 2026