Content by Mark Russinovich, Giorgio Severi, Blake Bullwinkel, Yanan Cai, Keegan Hines and Ahmed Salem (1)

A One-Prompt Attack That Breaks LLM Safety Alignment

Mark Russinovich, Giorgio Severi, Blake Bullwinkel, Yanan Cai, Keegan Hines, and Ahmed Salem investigate how quickly the safety alignment of modern language and diffusion models can be compromised, revealing the fragility of current defense approaches.
News

End of content

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please reload the page.