Profile and optimize agentic AI on Windows | DEMSP384

Name: Profile and optimize agentic AI on Windows | DEMSP384
Uploaded: 2026-06-05T06:11:46+00:00
Description: Freddy Chiu demonstrates how to profile and tune agentic AI applications on Intel-powered Windows PCs, focusing on end-to-end performance across CPU, GPU,...

Today by Freddy Chiu

Freddy Chiu demonstrates how to profile and tune agentic AI applications on Intel-powered Windows PCs, focusing on end-to-end performance across CPU, GPU, and NPU. The session shows how to collect telemetry, identify bottlenecks, and apply practical optimization techniques to improve responsiveness and power efficiency.

Overview

This Microsoft Build 2026 demo session covers profiling and optimizing agentic AI apps running on Windows, with emphasis on measuring performance across the full hardware/software stack and using telemetry to find and fix bottlenecks.

What the session covers

Profiling goals for agentic AI apps

Improve responsiveness (latency) and overall user experience.
Reduce power usage by understanding where work is happening (CPU vs GPU vs NPU).
Identify bottlenecks across the system rather than focusing only on model inference.

Telemetry types

Platform/system telemetry: hardware- and OS-level signals that help explain system behavior.
Application/middleware telemetry: instrumentation inside the app and its AI/runtime stack to understand where time is spent.

Intel tracing integration (ITT)

Demonstrates integrating Intel Tracing Technology (ITT) for software telemetry.
Focuses on adding instrumentation so profiling tools can attribute time/cost to meaningful tasks.

System-level hardware/software interaction

Discusses how to reason about performance across:
- CPU
- GPU
- NPU
Highlights the need to correlate software events with hardware-level metrics.

Tool invocation and optimization workflow

Shows a workflow for invoking profiling tools, collecting traces, and iterating on optimizations.
Covers building performance profiles and applying tuning techniques.

Platform telemetry and hardware-level metrics

Introduces platform telemetry concepts and the kinds of hardware-level metrics used to diagnose bottlenecks.

Custom instrumentation and task creation

Demonstrates creating custom tasks/markers in telemetry data to make traces easier to interpret.

Demo: model and compilation-time analysis

Transitions into a live demo using a 3.6B parameter model.
Analyzes:
- Model compilation time
- Software stack operations involved in getting the model running

Tools and technologies mentioned

Windows ML
OpenVINO
Intel Tracing Technology (ITT)
Profiling across CPU/GPU/NPU

Session metadata

Event: Microsoft Build 2026
Session code: DEMSP384
Format: Demo (Advanced)
Speakers listed in the session description:
- Freddy Chiu
- Vasanth Tovinkere