.NET AI Community Standup: High-Performance AI Inference on a Budget
Hosted by the dotnet team, this session features Bruno Capuano and Tal Wald as they demonstrate strategies for achieving fast, cost-effective AI inference using .NET, ONNX Runtime, and modern APIs.
.NET AI Community Standup: Blazing-Fast AI Inference on a Budget
Presented by: Bruno Capuano and Tal Wald
Hosted by: dotnet team
Overview
This session explores how developers can process over 20,000 sentences per second with minimal expenses by leveraging .NET for AI inference. The presenters demonstrate how AI workloads, typically developed in Python, can be accelerated and optimized using Microsoft’s .NET ecosystem.
Highlights include:
- Leveraging Hugging Face models in .NET workflows
- Utilizing ONNX Runtime for high-performance inference
- Making use of the latest .NET 9 and 10 APIs
- Designing flexible AI libraries to avoid tight coupling with specific engines or hardware
Key Topics Covered
Migrating from Python to .NET for AI
- Advantages of migrating AI/ML workloads to .NET
- Compatibility with popular Python-based workflows
Hugging Face Model Integration
- Using pre-trained models from Hugging Face repositories
- Adapting models for .NET-based inference
High-Performance Inference with ONNX Runtime
- Introduction to ONNX and ONNX Runtime
- Benchmarks: achieving over 20,000 sentences/sec
- Strategies for cost-effective deployment
.NET 9 & 10 AI APIs
- Overview of new APIs supporting AI workloads
- Increased flexibility and abstraction for developers
- Decoupling inference logic from underlying hardware
AI Library Architecture
- Building libraries that interface with multiple inference backends
- Best practices for extensibility and performance
Demos and Performance Comparisons
- Real-world demonstrations of .NET-based AI inference
- GPU versus CPU utilization metrics
- Cost and performance trade-offs
From Research to Production
- Tips for transitioning ML research prototypes to scalable, production-ready solutions
- Managing operational costs
Useful Links
Conclusion
This standup provides actionable insights for .NET developers aiming to bring AI workloads to production efficiently. By combining ONNX Runtime, .NET’s new APIs, and proper architectural design, high performance and low cost are achievable without sacrificing flexibility.