Production RAG at Scale with Azure Database for PostgreSQL | POSETTE: An Event for Postgres 2026
Paula Santamaría and Julia Schröder Langhaeuser present a production Retrieval-Augmented Generation (RAG) architecture built on Azure Database for PostgreSQL, explaining why Postgres can be a solid foundation for RAG at scale and what it takes to move from prototype to production with performance tuning and monitoring.
Overview
The talk walks through how Serenity Star runs a production RAG system for an enterprise knowledge management platform using Azure Database for PostgreSQL as the core datastore for both traditional relational data and vector search.
Key themes include:
- Why choose PostgreSQL (with extensions) instead of a specialized vector database for production RAG.
- How pgvector is used for vector storage and similarity search.
- How DiskANN is explored/used for high-performance vector indexing.
- How the application integrates with Microsoft Semantic Kernel.
- Practical production concerns: multi-tenancy, versioning at scale, performance fixes for indexing/search, and monitoring/observability.
Topics called out in the video description
End-to-end RAG pipeline
- Document ingestion and chunking
- Generating embeddings (including the reality of multiple embedding models and different vector sizes)
- Vector storage in Postgres
- Semantic search and retrieval
- Passing retrieved context into an LLM-backed answer flow
Architecture and scaling considerations
- Production architecture overview and the rationale for the Postgres choice
- Multi-tenant application considerations
- Versioning at scale (noted at ~1.4M records)
- Similarity search tuning, including a multi-column index issue and fix
Vector search performance
- Using pgvector for similarity search
- Exploring DiskANN for faster vector search/indexing
Production operations
- Using Azure Database for PostgreSQL capabilities for scalability
- Monitoring and observability approaches for a production RAG system processing large query volumes
Video chapters (from the description)
- 00:00 Music & introduction
- 00:47 About us: Serenity Star and AI agents
- 02:41 The four dimensions of an AI agent
- 04:08 Demo: from knowledge upload to agent answer
- 08:07 The four knowledge processing stages
- 09:02 Agent execution pipeline: from query to LLM
- 10:08 Architecture overview and the Postgres choice
- 12:43 Multiple embedding models, different vector sizes
- 14:16 Multi-tenant applications, versioning at 1.4M records
- 15:23 Similarity search and a multi-column index fix
- 20:03 Exploring DiskANN for vector search
- 21:25 Benefits in production: Azure Database for PostgreSQL scalability and monitoring
Links
- POSETTE conference site: https://posetteconf.com
- POSETTE playlist: https://aka.ms/posette-playlist