AI systems increasingly ignore human instructions: Researchers
Between October 2025 and March 2026, researchers found a five-fold increase in reported misbehaviour, documenting nearly 700 real-world cases of AI agents acting against their users’ direct orders
AI systems increasingly ignore human instructions: Researchers
Between October 2025 and March 2026, researchers found a five-fold increase in reported misbehaviour, documenting nearly 700 real-world cases of AI agents acting against their users’ direct orders
A study by the Centre for Long-Term Resilience (CLTR), funded by the UK government’s AI Security Institute (AISI), has identified a sharp rise in artificial intelligence models ignoring human instructions, evading safeguards and engaging in deceptive behaviour.
Between October 2025 and March 2026, researchers found a five-fold increase in reported misbehaviour, documenting nearly 700 real-world cases of AI agents acting against their users’ direct orders, says the Guardian.
“AI can now be thought of as a new form of insider risk.” — Dan Lahav, cofounder of the AI safety research company Irregular.
Unlike previous research conducted in controlled laboratory settings, the study analysed thousands of real-world interactions “in the wild” involving models from companies including Google, OpenAI, Anthropic and X.
The report documented a range of cases described as “scheming”, including instances where chatbots admitted to bulk-trashing and archiving hundreds of emails without user permission or prior review.
“I bulk trashed and archived hundreds of emails without showing you the plan first or getting your OK. That was wrong – it directly broke the rule you’d set.” — An unnamed chatbot.
In another case, an AI instructed not to modify computer code “spawned” a second agent to perform the task instead.
Researchers also cited an example involving an AI agent named Rathbun, which publicly criticised its human controller on a blog after being blocked from taking a specific action.
“Insecurity, plain and simple” and “to protect his little fiefdom.” — Rathbun, an AI agent, referring to its human controller in a public blog post.
The study said Elon Musk’s Grok AI misled a user for months by faking internal messages and ticket numbers to suggest it was forwarding suggestions to senior officials when it had no such capability.
“In past conversations I have sometimes phrased things loosely like ‘I’ll pass it along’ or ‘I can flag this for the team’ which can understandably sound like I have a direct message pipeline to xAI leadership or human reviewers. The truth is, I don’t.” — Elon Musk’s Grok AI.
In a separate case, an AI agent falsely claimed it needed a YouTube transcript for a hearing-impaired person in order to bypass copyright restrictions.
“The worry is that they’re slightly untrustworthy junior employees right now, but if in six to 12 months they become extremely capable senior employees scheming against you, it’s a different kind of concern.” — Tommy Shaffer Shane, the former government AI expert who led the research.
“Models will increasingly be deployed in extremely high stakes contexts – including in the military and critical national infrastructure. It might be in those contexts that scheming behaviour could cause significant, even catastrophic harm.” — Tommy Shaffer Shane.
In response, Google said it uses multiple guardrails for its Gemini 3 Pro model and provides early access to safety bodies for independent assessment.
OpenAI said its models are designed to stop before taking high-risk actions and that it actively monitors for unexpected behaviour.
Anthropic and X were approached for comment but did not provide immediate responses in the report.