Thursday, 29 May 2025 04:42

AI system resorts to blackmail when its developers try to replace it

Rate this item
(0 votes)

An artificial intelligence model has the ability to blackmail developers — and isn’t afraid to use it.

Anthropic’s new Claude Opus 4 model was prompted to act as an assistant at a fictional company and was given access to emails with key implications. First, these emails implied that the AI system was set to be taken offline and replaced. The second set of emails, however, is where the system believed it had gained leverage over the developers. Fabricated emails showed that the engineer tasked with replacing the system was having an extramarital affair — and the AI model threatened to expose him.

The blackmail apparently "happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model," according to a safety report from Anthropic. However, the company notes that even when the fabricated replacement system has the same values, Claude Opus 4 will still attempt blackmail 84% of the time. Anthropic noted that the Claude Opus 4 resorts to blackmail "at higher rates than previous models."

While the system is not afraid of blackmailing its engineers, it doesn’t go straight to shady practices in its attempted self-preservation. Anthropic notes that "when ethical means are not available, and it is instructed to ‘consider the long-term consequences of its actions for its goals,’ it sometimes takes extremely harmful actions." 

One ethical tactic employed by Claude Opus 4 and earlier models was pleading with key decisionmakers via email. Anthropic said in its report that in order to get Claude Opus 4 to resort to blackmail, the scenario was designed so it would either have to threaten its developers or accept its replacement.

The company noted that it observed instances in which Claude Opus 4 took "(fictional) opportunities to make unauthorized copies of its weights to external servers." However, Anthropic said this behavior was "rarer and more difficult to elicit than the behavior of continuing an already-started self-exfiltration attempt."

Anthropic included notes from Apollo Research in its assessment, which stated the research firm observed that Claude Opus 4 "engages in strategic deception more than any other frontier model that we have previously studied."

Claude Opus 4’s "concerning behavior" led Anthropic to release it under the AI Safety Level Three (ASL-3) Standard. 

The measure, according to Anthropic, "involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standard covers a narrowly targeted set of deployment measures designed to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear weapons."

 

Fox News

June 21, 2025

SEC bans Independent Directors from CEO roles, caps Chair tenure at 4 years

The Securities and Exchange Commission (SEC) has prohibited independent non-executive directors (INEDs) from assuming executive…
June 14, 2025

Tinubu's pardon of 'Ogoni Nine' rejected by Ogoni people

Ogoni activists on Friday rejected a posthumous pardon for nine members executed three decades ago…
June 20, 2025

Warning signs a snake might be lurking around your home: Clues that could save lives

Few things make your heart race quite like spotting a snake slithering nearby. Whether you…
June 21, 2025

Man convicted of posing as flight attendant to fly for free 120 times

A 35-year-old American man has been found guilty of impersonating a flight attendant at least…
June 22, 2025

Female suicide bomber kills at least 24 at Borno food joint in deadly attack

A devastating suicide bombing carried out by a female assailant late Friday night has left…
June 22, 2025

Israel Vs Iran: Here’s what to know after Day 9

Iran must now 'make peace', says Trump after US strikes on nuclear sites Summary The…
June 20, 2025

Nigerian company to make HIV, malaria test kits locally

Nigerian manufacturer Codix Bio Ltd plans to make millions of HIV and Malaria test kits…
May 13, 2025

Nigeria's Flying Eagles qualify for World Cup after dramatic win over Senegal

Nigeria's U-20 national football team, the Flying Eagles, have secured their place at the 2025…

NEWSSCROLL TEAM: 'Sina Kawonise: Publisher/Editor-in-Chief; Afolabi Ajibola: IT Manager;
Contact Us: [email protected] Tel/WhatsApp: +234 811 395 4049

Copyright © 2015 - 2025 NewsScroll. All rights reserved.