Microsoft Fabric Architecture: A Deep Dive Into the Unified Analytics Platform
Microsoft Fabric Architecture is a comprehensive technical deep dive into Microsoft Fabric's architecture, covering OneLake, lakehouses, warehouses, data pipelines, and how the components work together.
A comprehensive technical deep dive into Microsoft Fabric's architecture, covering OneLake, lakehouses, warehouses, data pipelines, and how the components work together.
Al Rafay Consulting
· Updated February 25, 2026 · ARC Team
What Makes Fabric Different
Microsoft Fabric is not simply a rebrand of existing Azure data services. It represents a fundamental architectural shift: a unified analytics platform built on a single data lake (OneLake) with integrated compute engines for every analytics workload — data engineering, data warehousing, real-time intelligence, data science, and business intelligence.
Before Fabric, building an enterprise analytics platform on Azure meant provisioning and integrating separate services: Azure Data Factory for pipelines, Azure Synapse for warehousing, Azure Databricks for data science, Azure Data Lake Storage for storage, and Power BI for visualization. Each service had its own security model, storage format, metadata catalog, and billing structure.
Fabric collapses this complexity into a single SaaS platform with shared governance, a single security model, and one copy of the data.
OneLake: The Foundation
What Is OneLake?
OneLake is Fabric’s built-in data lake, analogous to OneDrive for data. Key characteristics:
- One lake per tenant — every Fabric capacity in your organization shares a single OneLake instance
- Built on Azure Data Lake Storage Gen2 — full ADLS Gen2 compatibility with hierarchical namespace
- Open format — data is stored in Delta Lake (Parquet + transaction log), an open standard accessible by any tool
- Automatic provisioning — no storage accounts to create, manage, or secure separately
- Multi-cloud shortcuts — create references to data in AWS S3 or Google Cloud Storage without copying it
OneLake Hierarchy
OneLake organizes data in a familiar hierarchy:
OneLake (Tenant)
├── Workspace: Sales Analytics
│ ├── Lakehouse: SalesLakehouse
│ │ ├── Tables/
│ │ │ ├── customers (Delta table)
│ │ │ ├── orders (Delta table)
│ │ │ └── products (Delta table)
│ │ └── Files/
│ │ ├── raw_data/
│ │ └── staging/
│ ├── Warehouse: SalesWarehouse
│ ├── Semantic Model: SalesModel
│ └── Report: SalesReport
├── Workspace: HR Analytics
│ └── ...
└── Workspace: Finance
└── ...
Shortcuts: Zero-Copy Data Access
Shortcuts are one of OneLake’s most powerful features. A shortcut is a reference to data stored elsewhere — another OneLake location, an ADLS Gen2 account, an S3 bucket, or a GCS bucket. The data is not copied; the shortcut provides a transparent access layer.
Use cases for shortcuts:
- Cross-workspace data sharing — the Finance workspace creates a shortcut to the Sales lakehouse’s customer table without duplicating data
- Hybrid cloud — reference data in AWS S3 alongside data in OneLake without ETL
- Legacy migration — create shortcuts to existing ADLS Gen2 storage while gradually migrating to Fabric-native items
- Data mesh — each domain owns its data in its workspace and publishes shortcuts for cross-domain access
Compute Engines
Data Engineering (Apache Spark)
Fabric provides a fully managed Apache Spark environment for large-scale data processing:
- Spark pools — auto-scaling Spark clusters with Starter pools (instant start) and custom pools
- Notebooks — interactive development with Python, Scala, R, and SparkSQL
- Spark job definitions — scheduled batch jobs for production pipelines
- VS Code integration — develop locally and deploy to Fabric
- Libraries — install custom Python and R packages per workspace or session
- Lakehouse integration — Spark reads and writes directly to OneLake Delta tables
Key architectural detail: Fabric Spark uses a shared metadata layer. When Spark writes a Delta table to a lakehouse, that table is immediately visible in the SQL endpoint and Power BI — no ETL, no sync, no delay.
Data Warehouse
Fabric’s data warehouse provides a full T-SQL experience:
- T-SQL compatibility — familiar SQL Server syntax for queries, views, stored procedures, and functions
- Distributed query engine — columnar storage with distributed processing for fast analytical queries
- Cross-database queries — query across warehouses and lakehouses within the same workspace
- Clone tables — zero-copy table clones for development and testing
- Time travel — query data as it existed at a previous point in time
Warehouse vs. Lakehouse SQL Endpoint:
| Feature | Warehouse | Lakehouse SQL Endpoint |
|---|---|---|
| Write operations (INSERT, UPDATE, DELETE) | Full DML support | Read-only |
| Stored procedures | Yes | No |
| T-SQL views | Read-write | Read-only |
| Security (row-level, column-level) | Full support | Limited |
| Performance optimization | Manual (statistics, indexes) | Automatic |
| Best for | Complex transformations, reporting | Exploration, ad-hoc queries |
Real-Time Intelligence
Fabric’s real-time analytics engine (based on Azure Data Explorer / Kusto) handles streaming data:
- Eventstreams — ingest data from Azure Event Hubs, Kafka, IoT Hub, custom apps, and database change data capture
- KQL Database — store and query streaming data with Kusto Query Language
- Real-time dashboards — live visualizations that update as data arrives
- Reflexes — event-driven triggers that fire actions based on data conditions
Architecture pattern for IoT monitoring:
IoT Devices → Event Hubs → Eventstream → KQL Database → Real-Time Dashboard
↓
Reflex (alert when temperature > threshold)
Data Science
Fabric integrates data science capabilities:
- Notebooks with MLflow tracking for experiment management
- Models registered in the Fabric model registry
- PREDICT function for scoring models directly in T-SQL or Spark
- Semantic link for accessing Power BI semantic models from notebooks
Data Factory
Fabric Data Factory handles data movement and orchestration:
- Dataflows Gen2 — Power Query-based transformations (low-code ETL)
- Data pipelines — orchestration workflows similar to Azure Data Factory
- Copy activity — move data from 100+ source connectors into OneLake
- Scheduling — cron-based and event-based pipeline triggers
The Medallion Architecture in Fabric
The medallion architecture (Bronze → Silver → Gold) is the recommended pattern for organizing data in Fabric:
Bronze Layer (Raw)
- Raw data ingested from source systems without transformation
- Stored in the lakehouse Files section or as Delta tables
- Full fidelity — preserves the exact data as received from the source
- Serves as the system of record for auditability
Silver Layer (Curated)
- Cleaned, validated, and standardized data
- Data type enforcement, null handling, deduplication
- Conformed dimensions (consistent customer IDs, product codes)
- Stored as Delta tables in the lakehouse Tables section
Gold Layer (Business-Ready)
- Aggregated, enriched data optimized for specific business use cases
- Star schema models for reporting (fact and dimension tables)
- Pre-computed metrics and KPIs
- Consumed by Power BI semantic models and operational applications
Sources → Bronze (raw) → Silver (curated) → Gold (business-ready) → Power BI
↑ ↑ ↑ ↑
Data Pipelines/ Spark/SQL Semantic
Factory Dataflows Notebooks Models
Security Architecture
Workspace-Level Security
- Roles: Admin, Member, Contributor, Viewer
- Microsoft Entra ID integration — assign roles to users, groups, and service principals
- Workspace identity — managed identity for accessing external resources
Item-Level Security
- OneLake data access roles — control who can read specific folders and tables within a lakehouse
- Row-level security (RLS) — filter data rows based on user identity in warehouses and semantic models
- Column-level security (CLS) — restrict access to sensitive columns in warehouses
- Object-level security (OLS) — hide tables and columns from users in semantic models
- Dynamic data masking — mask sensitive data (SSN, email) in query results
Network Security
- Private endpoints — access Fabric from within your VNet
- Managed private endpoints — connect Fabric to data sources via private network
- Trusted workspace access — allow specific workspaces to access secured storage accounts
Capacity and Licensing
Fabric uses capacity-based licensing measured in Capacity Units (CUs):
| SKU | CU | Spark VCores | Max Memory | Approximate Monthly Cost |
|---|---|---|---|---|
| F2 | 2 | 4 | 6 GB | ~$262 |
| F4 | 4 | 8 | 12 GB | ~$525 |
| F8 | 8 | 16 | 24 GB | ~$1,049 |
| F16 | 16 | 32 | 48 GB | ~$2,099 |
| F32 | 32 | 64 | 96 GB | ~$4,197 |
| F64 | 64 | 128 | 192 GB | ~$8,395 |
All workloads — Spark, SQL, pipelines, Power BI — share the same capacity pool. Fabric uses bursting to temporarily exceed your CU allocation for short workloads, and smoothing to average consumption over time, so you do not need to provision for peak demand.
Pause and resume: Fabric capacities can be paused when not in use, which is particularly valuable for development and testing environments.
Design Patterns
Pattern 1: Centralized Data Platform
One team manages a central Fabric capacity with shared lakehouses:
- Central data engineering team owns ingestion, transformation, and governance
- Business teams consume data through Power BI semantic models
- Simple governance but potential bottleneck on the central team
Pattern 2: Data Mesh
Each domain owns its data in dedicated workspaces:
- Sales, Finance, HR each have their own workspace with lakehouses
- Domains publish curated datasets via OneLake shortcuts
- Central governance team manages tenant-level policies and standards
- More autonomous but requires mature data culture
Pattern 3: Hub and Spoke
Hybrid approach:
- Central hub workspace manages shared reference data and enterprise-wide transformations
- Domain-specific spoke workspaces for team-level analytics
- Shortcuts connect spokes to hub data without duplication
Migration Path
For organizations running Azure Synapse, Azure Data Factory, or Databricks:
- Assessment — inventory existing pipelines, datasets, and consumers
- OneLake shortcuts — create shortcuts to existing ADLS Gen2 storage for immediate access in Fabric
- Migrate pipelines — convert ADF pipelines to Fabric Data Factory pipelines (high compatibility)
- Migrate notebooks — port Databricks or Synapse Spark notebooks to Fabric (Spark API compatible)
- Migrate Power BI — existing Power BI workspaces can be assigned to Fabric capacities
- Decommission — retire legacy services once Fabric workloads are validated
Next Steps
Microsoft Fabric’s unified architecture reduces the complexity and cost of enterprise analytics, but realizing its benefits requires thoughtful design — from OneLake organization and security to capacity planning and workload optimization.
Al Rafay Consulting helps organizations design, migrate to, and optimize Microsoft Fabric deployments. Whether you are evaluating Fabric for a new project or planning a migration from existing Azure data services, our team brings hands-on experience across every Fabric workload.
Al Rafay Consulting
ARC Team
AI-powered Microsoft Solutions Partner delivering enterprise solutions on Azure, SharePoint, and Microsoft 365.
LinkedIn Profile