Production RAG at Scale with Azure Database for PostgreSQL | POSETTE: An Event for Postgres 2026

Name: Production RAG at Scale with Azure Database for PostgreSQL | POSETTE: An Event for Postgres 2026
Uploaded: 2026-06-17T20:50:19+00:00
Description: Paula Santamaría and Julia Schröder Langhaeuser present a production Retrieval-Augmented Generation (RAG) architecture built on Azure Database for...

Yesterday by Paula Santamaría and Julia Schröder Langhaeuser

Paula Santamaría and Julia Schröder Langhaeuser present a production Retrieval-Augmented Generation (RAG) architecture built on Azure Database for PostgreSQL, explaining why Postgres can be a solid foundation for RAG at scale and what it takes to move from prototype to production with performance tuning and monitoring.

Overview

The talk walks through how Serenity Star runs a production RAG system for an enterprise knowledge management platform using Azure Database for PostgreSQL as the core datastore for both traditional relational data and vector search.

Key themes include:

Why choose PostgreSQL (with extensions) instead of a specialized vector database for production RAG.
How pgvector is used for vector storage and similarity search.
How DiskANN is explored/used for high-performance vector indexing.
How the application integrates with Microsoft Semantic Kernel.
Practical production concerns: multi-tenancy, versioning at scale, performance fixes for indexing/search, and monitoring/observability.

Topics called out in the video description

End-to-end RAG pipeline

Document ingestion and chunking
Generating embeddings (including the reality of multiple embedding models and different vector sizes)
Vector storage in Postgres
Semantic search and retrieval
Passing retrieved context into an LLM-backed answer flow

Architecture and scaling considerations

Production architecture overview and the rationale for the Postgres choice
Multi-tenant application considerations
Versioning at scale (noted at ~1.4M records)
Similarity search tuning, including a multi-column index issue and fix

Vector search performance

Using pgvector for similarity search
Exploring DiskANN for faster vector search/indexing

Production operations

Using Azure Database for PostgreSQL capabilities for scalability
Monitoring and observability approaches for a production RAG system processing large query volumes

Video chapters (from the description)

00:00 Music & introduction
00:47 About us: Serenity Star and AI agents
02:41 The four dimensions of an AI agent
04:08 Demo: from knowledge upload to agent answer
08:07 The four knowledge processing stages
09:02 Agent execution pipeline: from query to LLM
10:08 Architecture overview and the Postgres choice
12:43 Multiple embedding models, different vector sizes
14:16 Multi-tenant applications, versioning at 1.4M records
15:23 Similarity search and a multi-column index fix
20:03 Exploring DiskANN for vector search
21:25 Benefits in production: Azure Database for PostgreSQL scalability and monitoring