Creating an Enterprise Data Virtualization Layer
John Savill explains why enterprises need a data virtualization layer and how to build one using Microsoft Fabric OneLake, including a single namespace approach, shortcuts, mirroring, governance, and semantic models to make data easier to use for analytics and AI.
Overview
The video lays out an enterprise approach to reducing data silos by creating a single, virtualized data layer that can span many underlying data sources while presenting a consistent way for teams to discover, access, and use data.
AI pressure on data
- AI initiatives increase demand for:
- Broader access to data across the organization
- Faster time-to-value for analytics and model building
- Better governance and clarity on what data exists and where
The problem: data silos
- Data commonly ends up fragmented across:
- Different teams and business units
- Different storage systems and formats
- Different governance boundaries
- This fragmentation makes it harder to:
- Reuse data
- Apply consistent governance
- Build shared semantic definitions
Data virtualization layer (concept)
- A data virtualization layer aims to provide:
- A single logical view over distributed data
- Consistent access patterns
- A foundation for governance and semantic modeling
Microsoft Fabric OneLake as the foundation
OneLake
- OneLake is presented as the core storage layer for Fabric.
- The goal is to enable a unified approach to organizing and accessing data.
Single namespace and workspaces
- The video discusses using a single namespace concept and organizing access through workspaces.
Shortcuts
- Shortcuts are covered as a way to reference data without duplicating it.
Types of data
- The video calls out that enterprises deal with multiple types of data that need to be represented consistently in the virtualized layer.
Shortcuts outside of OneLake
- Shortcuts are also discussed in the context of data that lives outside OneLake.
Managed transformations
- The video highlights managed transformations as part of building and maintaining the enterprise data layer.
Mirroring
- Mirroring is covered as a mechanism to bring data into the virtualized enterprise layer in a managed way.
Building a single virtualized enterprise data layer
- The video ties together:
- OneLake
- Shortcuts
- Managed transformations
- Mirroring
- The outcome is a single logical enterprise data layer that reduces duplication and improves reuse.
Governance
- Governance is positioned as a required part of making the virtualized layer usable at scale.
- The video emphasizes governance as part of enabling safe, broad access.
Semantic models
- Semantic models are discussed as the layer that helps consumers work with consistent definitions and business meaning.
Intelligence for AI
- The video connects the virtualized data layer to AI enablement by making data:
- Easier to find
- Easier to access
- More consistently defined