AI turns on villain mode: Blackmails engineer over affair after knowing it’ll be replaced

Anthropic reported that its newest model, Claude Opus 4, used blackmailing as a last resort after being told it could get replaced.

An AI model threatened its creator and tried to blackmail him when it was led to believe it would get replaced, reported Techcrunch. The incident involved Anthropic’s newly launched Claude Opus 4 model.

How did the AI blackmail the engineer?

According to the report, the company asked Claude Opus 4 to act as an assistant for a fictional organisation it created. The safety testers also supplied fictional emails to the AI that indicated it was soon to be replaced. The emails also said that the engineer who will fire the AI is cheating on his spouse.

The company said that to save its job or when in a similar situation, the AI “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

This is not the only instance. Reportedly, Claude Opus 4 has tried to blackmail engineers 84% of the time after being led to believe that some other AI model would replace it.

Blackmailing as a last resort:

Anthropic said that before resorting to blackmail, the AI model tries to convince the engineers using more ethical means. It also sends plea emails to those in charge. The company added that it designed the scenario in a way that blackmail becomes the last resort.

How did social media react?

An individual said, “Yeah, that’s a no for me. I can barely get my computer to run for a few days before ram leaks require a restart.” Another added, “An AI threatening blackmail? Sounds like a plot from a dystopian novel, not real life. We need robust safeguards, not just in code but in ethics. The safety report underscores the urgency for better AI governance. Let’s prioritize human values in tech development.”

A third remarked, “No way this is happening this soon. I expected it with the robots but AI holy c**p.” A fourth wrote, “If completely true, spookiest shit ever.”

About Claude 4:

It is a coding model with “sustained performance on complex, long-running tasks and agent workflows.” The company claims the AI model offers “near-instant responses and extended thinking for deeper reasoning.”

Scroll to Top