Content by Mark Russinovich, Giorgio Severi, Blake Bullwinkel, Yanan Cai, Keegan Hines and Ahmed Salem (1)

A One-Prompt Attack That Breaks LLM Safety Alignment

Feb 9, 2026 by Mark Russinovich, Giorgio Severi, Blake Bullwinkel, Yanan Cai, Keegan Hines and Ahmed Salem

Mark Russinovich, Giorgio Severi, Blake Bullwinkel, Yanan Cai, Keegan Hines, and Ahmed Salem investigate how quickly the safety alignment of modern language and diffusion models can be compromised, revealing the fragility of current defense approaches.

News

End of content