Azure Storage for AI workloads | OD870
Saurabh Sensharma, Vishnu Charan TJ, and Saloni Sonpal walk through how Azure Storage can be used to improve performance and cost efficiency for AI inference workloads, including caching patterns, faster model distribution, and integrations across the AI stack.
Overview
The session covers how Azure Storage powers AI inference at scale, with an emphasis on:
- Securely bringing enterprise data to AI models
- Accelerating AI workloads with high-performance storage
- Reducing GPU idle time via faster model loading and optimized data access
- Integrating Azure Storage with Microsoft and open-source AI frameworks
- Enabling scalable, agent-based (agentic) applications
Topics and chapters
Introduction to Azure Storage for AI workloads
- High-level framing of storage needs for AI inference at scale
Storage for AI and AI for Storage
- Overview of how storage supports AI workloads, and how AI can be applied to storage scenarios
Azure Storage integration across the AI stack and infrastructure
- How Azure Storage fits into AI infrastructure and the broader AI stack
Azure Storage clients and tools for AI workloads
- Discussion of storage clients and tooling used to connect AI workloads to Azure Storage
Paths to run AI workloads with storage
The presenters outline common execution environments where Azure Storage is used:
- Azure AI Foundry
- Azure Kubernetes Service (AKS)
- Infrastructure-as-a-Service (IaaS)
Storage requirements for agentic inference
- Storage considerations for agent-based inference scenarios and the roles storage plays in those architectures
Inference optimization through prompt caching
- Prompt caching as a technique to improve inference performance and reduce repeated work
Explicit caching with Azure Blob and NIXL (demo)
- A demo showing explicit caching using Azure Blob Storage
- NIXL integration is referenced as part of the caching approach
Fast model loading and distribution
- Approaches to reduce model load time and improve distribution efficiency
- Run:AI Streamer and a distributed cache are referenced in this segment
Bringing enterprise data to AI via Azure integrations
- Azure integrations for connecting enterprise data to AI workflows
- Foundry IQ is referenced in the context of these integrations
Storage Center and recap
- Introduction of Storage Center
- Session recap of the main performance and scalability themes