Profile and optimize agentic AI on Windows | DEMSP384
Freddy Chiu demonstrates how to profile and tune agentic AI applications on Intel-powered Windows PCs, focusing on end-to-end performance across CPU, GPU, and NPU. The session shows how to collect telemetry, identify bottlenecks, and apply practical optimization techniques to improve responsiveness and power efficiency.
Overview
This Microsoft Build 2026 demo session covers profiling and optimizing agentic AI apps running on Windows, with emphasis on measuring performance across the full hardware/software stack and using telemetry to find and fix bottlenecks.
What the session covers
Profiling goals for agentic AI apps
- Improve responsiveness (latency) and overall user experience.
- Reduce power usage by understanding where work is happening (CPU vs GPU vs NPU).
- Identify bottlenecks across the system rather than focusing only on model inference.
Telemetry types
- Platform/system telemetry: hardware- and OS-level signals that help explain system behavior.
- Application/middleware telemetry: instrumentation inside the app and its AI/runtime stack to understand where time is spent.
Intel tracing integration (ITT)
- Demonstrates integrating Intel Tracing Technology (ITT) for software telemetry.
- Focuses on adding instrumentation so profiling tools can attribute time/cost to meaningful tasks.
System-level hardware/software interaction
- Discusses how to reason about performance across:
- CPU
- GPU
- NPU
- Highlights the need to correlate software events with hardware-level metrics.
Tool invocation and optimization workflow
- Shows a workflow for invoking profiling tools, collecting traces, and iterating on optimizations.
- Covers building performance profiles and applying tuning techniques.
Platform telemetry and hardware-level metrics
- Introduces platform telemetry concepts and the kinds of hardware-level metrics used to diagnose bottlenecks.
Custom instrumentation and task creation
- Demonstrates creating custom tasks/markers in telemetry data to make traces easier to interpret.
Demo: model and compilation-time analysis
- Transitions into a live demo using a 3.6B parameter model.
- Analyzes:
- Model compilation time
- Software stack operations involved in getting the model running
Tools and technologies mentioned
- Windows ML
- OpenVINO
- Intel Tracing Technology (ITT)
- Profiling across CPU/GPU/NPU
Session metadata
- Event: Microsoft Build 2026
- Session code: DEMSP384
- Format: Demo (Advanced)
- Speakers listed in the session description:
- Freddy Chiu
- Vasanth Tovinkere