Anthropic says some claude models can now end 'harmful or violent' conversations

Anthropic says some claude models can now end ‘harmful or violent’ conversations

Anthropic has announced new capabilities that allow some of its latest, largest models to end conversations in what the company describes as “rare, extreme cases of persistent harmful or violent user interactions.” Inituating, Anthropic says it does this to not protect the human user, but rather the AI model itself.

To be clear, the company does not claim that its Claude AI models are sensitive or may be damaged by their conversations with users. In its own words, anthropic “very uncertain about the potential moral status of Claude and other LLMs, now or in the future.”

However, its message points to a recent program created to study what it calls “model welfare”, and says that Anthropic essentially takes a just-in-case approach, “working to identify and implement low cost interventions to mitigate risks to model welfare, in case of such welfare.”

This latest change is currently limited to Claude Opus 4 and 4.1. And again, it should only be done in “extreme edge cases”, such as “requests from users about sexual content involving minors and attempts to request information that would enable large -scale violence or terrorist acts.”

While these types of requests could potentially create legal or advertising problems for anthropic (witnessing recent reporting on how Chatgpt could potentially reinforce or contribute to its users’ delusions), the company says that Claude 4 in prior description “when it did.

As for these new conversational capabilities, the company says: “In all cases, Claude is only to use his conversation-closing ability as a last resort when several attempts at redirection have failed and the hope of a productive interaction is depleted or when a user explicitly asks Claude to end a chat.”

Anthropic also says that Claude has been “instructed not to use this ability in cases where users may have an impending risk of harming themselves or others.”

TechCrunch -event

San Francisco
|
27-29. October 2025

When Claude concludes a conversation, Anthropic says users will still be able to start new conversations from the same story and create new branches of the cumbersome conversation by editing their answers.

“We treat this feature as an ongoing experiment and will continue to refine our approach,” says the company.