Written by stclarke, this post presents Project Ire, Microsoft’s new autonomous AI agent built for large-scale malware detection by reverse engineering and classifying threats. The article addresses technical foundations, system accuracy, and its impact on cybersecurity operations.

Project Ire: Autonomous AI Agent for Large-Scale Malware Detection and Classification

Author: stclarke

Project Ire Illustration

Introduction

Microsoft introduces Project Ire, an autonomous AI agent capable of analyzing and classifying software without manual assistance—a significant advancement in cybersecurity and malware detection. Project Ire automates the gold standard of malware classification: fully reverse engineering software files lacking origins or known purposes. Leveraging decompilers and analysis tools, it reviews output to determine whether software samples are malicious or benign.

Background and Collaboration

Project Ire is the result of a collaboration among Microsoft Research, Microsoft Defender Research, and Microsoft Discovery & Quantum. This convergence brings together security expertise, operational data from global malware telemetry, and state-of-the-art AI research. Project Ire is built atop agentic and collaborative technologies that also underpin GraphRAG and Microsoft Discovery projects.

System Architecture and Technical Foundation

Tool-Use API: Project Ire utilizes an API to coordinate a wide range of reverse engineering and binary analysis tools, including decompilers, Microsoft’s memory analysis sandbox (Project Freta), custom and open-source reverse engineering tools, and documentation search.
Advanced Language Models drive investigation and adjudication, integrating evidence collected from automated tool outputs.
Multi-level Reasoning: The system performs reasoning from binary analysis through control flow reconstruction up to high-level interpretation of software behavior.

Automation in Malware Classification

Malware classification at this level is traditionally a labor-intensive, manual task. Global platforms like Microsoft Defender scan over one billion devices monthly, requiring continuous manual review. Analysts experience significant alert fatigue and burnout, further complicated by the subjective nature of malware identification, lacking definitive validators.

Project Ire addresses these constraints with an autonomous approach:

Automated triage and control flow graph reconstruction using frameworks such as angr and Ghidra
Iterative function analysis, calling tools through APIs and compiling a traceable, “chain of evidence” for each classification
Validator tools to verify claims in the report against this evidence, drawing on expert reverse engineering statements

Evaluation Results and Real-World Performance

Initial Dataset Testing

Project Ire was evaluated on public datasets of Windows drivers, using malicious samples from the Living off the Land Drivers repository and benign samples from Windows Update. The classifier demonstrated:

Precision: 0.98
Recall: 0.83
Correctly identified 90% of all files, with only 2% false positives

Example: Trojan:Win64/Rootkit.EH!MTB

Detected malware by identifying behaviors such as process manipulation (targeting Explorer.exe), network command and control, and code injection techniques like process hooking
Highlighted suspicious activity (infinite process monitoring loops, HTTP GET payloads, process entry-point patching)

Example: HackTool:Win64/KillAV!MTB

Detected functions designed to log and terminate antivirus/security software processes
Noted aggressive anti-debugging and anti-tampering techniques, with issues flagged, reviewed, and corrected by validating tools
Core behaviors pointed to disabling system security, consistent with rootkit/trojan objectives

Large-Scale “Hard Target” Evaluation

On a set of nearly 4,000 files unclassified by prior automated systems:

Precision: 0.89 (9 of 10 malicious flags were correct)
Recall: 0.26 (system detected around one quarter of actual malware)
Only 4% false-positive rate

Operational Integration and Future Vision

Given these promising preliminary results, Project Ire will be leveraged within Microsoft’s Defender platform as Binary Analyzer for automated threat detection and software classification. The long-term goal is to scale accuracy and performance to classify any file—including unknowns—on first encounter and ultimately detect novel malware directly in memory at scale.

Technical Tools and Collaborations

Tools used: Project Freta, angr, Ghidra, custom/open-source reverse engineering utilities
Collaboration: Emotion Labs contributed reverse engineering innovations; multiple Microsoft teams contributed expertise and review

Additional Media

The AI Revolution in Medicine, Revisited

Join Microsoft’s Peter Lee as he explores the impact of AI on healthcare futures (podcast).

Acknowledgements

Contributors include Dayenne de Souza, Raghav Pande, Ryan Terry, Shauharda Khadka, and Bob Fleck. Longstanding collaboration with Emotion Labs has influenced the success and autonomy of Project Ire.

Summary

Project Ire marks a milestone in using autonomous AI for scalable, evidence-driven malware classification and threat detection, reducing analyst burden and setting the stage for next-generation cybersecurity automation within Microsoft’s Defender suite.

This post appeared first on “Microsoft News”. Read the entire article here