Project Ire: Autonomous AI Agent for Large-Scale Malware Detection and Classification
Written by stclarke, this post presents Project Ire, Microsoft’s new autonomous AI agent built for large-scale malware detection by reverse engineering and classifying threats. The article addresses technical foundations, system accuracy, and its impact on cybersecurity operations.
Project Ire: Autonomous AI Agent for Large-Scale Malware Detection and Classification
Author: stclarke
Introduction
Microsoft introduces Project Ire, an autonomous AI agent capable of analyzing and classifying software without manual assistance—a significant advancement in cybersecurity and malware detection. Project Ire automates the gold standard of malware classification: fully reverse engineering software files lacking origins or known purposes. Leveraging decompilers and analysis tools, it reviews output to determine whether software samples are malicious or benign.
Background and Collaboration
Project Ire is the result of a collaboration among Microsoft Research, Microsoft Defender Research, and Microsoft Discovery & Quantum. This convergence brings together security expertise, operational data from global malware telemetry, and state-of-the-art AI research. Project Ire is built atop agentic and collaborative technologies that also underpin GraphRAG and Microsoft Discovery projects.
System Architecture and Technical Foundation
- Tool-Use API: Project Ire utilizes an API to coordinate a wide range of reverse engineering and binary analysis tools, including decompilers, Microsoft’s memory analysis sandbox (Project Freta), custom and open-source reverse engineering tools, and documentation search.
- Advanced Language Models drive investigation and adjudication, integrating evidence collected from automated tool outputs.
- Multi-level Reasoning: The system performs reasoning from binary analysis through control flow reconstruction up to high-level interpretation of software behavior.
Automation in Malware Classification
Malware classification at this level is traditionally a labor-intensive, manual task. Global platforms like Microsoft Defender scan over one billion devices monthly, requiring continuous manual review. Analysts experience significant alert fatigue and burnout, further complicated by the subjective nature of malware identification, lacking definitive validators.
Project Ire addresses these constraints with an autonomous approach:
- Automated triage and control flow graph reconstruction using frameworks such as angr and Ghidra
- Iterative function analysis, calling tools through APIs and compiling a traceable, “chain of evidence” for each classification
- Validator tools to verify claims in the report against this evidence, drawing on expert reverse engineering statements
Evaluation Results and Real-World Performance
Initial Dataset Testing
Project Ire was evaluated on public datasets of Windows drivers, using malicious samples from the Living off the Land Drivers repository and benign samples from Windows Update. The classifier demonstrated:
- Precision: 0.98
- Recall: 0.83
- Correctly identified 90% of all files, with only 2% false positives
Example: Trojan:Win64/Rootkit.EH!MTB
- Detected malware by identifying behaviors such as process manipulation (targeting Explorer.exe), network command and control, and code injection techniques like process hooking
- Highlighted suspicious activity (infinite process monitoring loops, HTTP GET payloads, process entry-point patching)
Example: HackTool:Win64/KillAV!MTB
- Detected functions designed to log and terminate antivirus/security software processes
- Noted aggressive anti-debugging and anti-tampering techniques, with issues flagged, reviewed, and corrected by validating tools
- Core behaviors pointed to disabling system security, consistent with rootkit/trojan objectives
Large-Scale “Hard Target” Evaluation
On a set of nearly 4,000 files unclassified by prior automated systems:
- Precision: 0.89 (9 of 10 malicious flags were correct)
- Recall: 0.26 (system detected around one quarter of actual malware)
- Only 4% false-positive rate
Operational Integration and Future Vision
Given these promising preliminary results, Project Ire will be leveraged within Microsoft’s Defender platform as Binary Analyzer for automated threat detection and software classification. The long-term goal is to scale accuracy and performance to classify any file—including unknowns—on first encounter and ultimately detect novel malware directly in memory at scale.
Technical Tools and Collaborations
- Tools used: Project Freta, angr, Ghidra, custom/open-source reverse engineering utilities
- Collaboration: Emotion Labs contributed reverse engineering innovations; multiple Microsoft teams contributed expertise and review
Additional Media
The AI Revolution in Medicine, Revisited
Join Microsoft’s Peter Lee as he explores the impact of AI on healthcare futures (podcast).
Acknowledgements
Contributors include Dayenne de Souza, Raghav Pande, Ryan Terry, Shauharda Khadka, and Bob Fleck. Longstanding collaboration with Emotion Labs has influenced the success and autonomy of Project Ire.
Summary
Project Ire marks a milestone in using autonomous AI for scalable, evidence-driven malware classification and threat detection, reducing analyst burden and setting the stage for next-generation cybersecurity automation within Microsoft’s Defender suite.
This post appeared first on “Microsoft News”. Read the entire article here