My Tech Learning: August 2025

Tuesday, August 19, 2025

The Power of Lakehouses and Warehouses in Microsoft Fabric

In everyone's Microsoft Fabric journey, you need to make a critical decision point -- Where should your data live ?

In Fabric, you have two primary items or "workhorses" for storing and querying your structured data: the Lakehouse and the Warehouse. Because both store data in the open Delta Lake format and live in OneLake, the choice isn't about storage—it's about the engine you want to use and the persona of the person doing the work.

The Lakehouse: The Data Engineer's Playground

The Lakehouse is the "Swiss Army Knife" of Fabric. It is designed for flexibility, allowing you to store structured tables alongside semi-structured (JSON, XML) and unstructured data (images, PDFs, logs).

Primary Engine: Apache Spark.
Primary Languages: Python (PySpark), Scala, R, and Spark SQL.
Best For: Data engineering, big data processing, and Data Science/ML workloads.

Key Feature: The SQL Analytics Endpoint Every Lakehouse automatically comes with a SQL Analytics Endpoint. This is a read-only gateway that allows SQL-savvy users to query the Lakehouse tables using familiar T-SQL, without needing to know a line of Python.

The Data Warehouse: The SQL Pro’s Powerhouse

If the Lakehouse is a "playground," the Data Warehouse is a high-performance relational engine. It is designed for developers who want the full power of T-SQL (Transactional SQL) and traditional data warehousing capabilities like primary keys, foreign keys, and complex transactions.

Primary Engine: Distributed SQL Engine (Polaris).
Primary Language: T-SQL (Transact-SQL).
Best For: Enterprise data warehousing, complex SQL transformations, and users migrating from SQL Server or Synapse Dedicated Pools.

Key Feature: Unlike the Lakehouse SQL Endpoint, which is read-only, the Data Warehouse is fully read-write. You can use INSERT, UPDATE, and DELETE commands directly against your tables using standard SQL scripts.

The beauty of Fabric is that both tools use the Delta Lake format. This means a Power BI report can look at data in a Lakehouse and a Warehouse simultaneously without any "data movement."

Feature	Lakehouse	Data Warehouse
Primary Persona	Data Engineers / Scientists	SQL Developers / Analysts
Primary Engine	Apache Spark	SQL Engine
Write Support	Spark (Python/Scala), Dataflow	T-SQL (DML), Dataflow, Pipelines
Read Support	Spark & T-SQL (Read-only)	T-SQL (Read-Write)
Structure	Files + Folders + Tables	Relational Tables only
Multi-table Transactions	Limited (Spark level)	Full ACID Support

In a real-world Fabric implementation, you don't usually pick just one.

Most architects follow this flow:

Lakehouse for the "Bronze" and "Silver" layers (raw ingestion and Spark-based cleaning).
Data Warehouse for the "Gold" layer (curated, highly structured data ready for business users to query with SQL).

Here is a tip :

Choose Lakehouse if: You have semi-structured data (JSON), love Python/Notebooks, or are doing Machine Learning.
Choose Warehouse if: You are building a star schema, need full SQL "Write" capabilities, or are migrating a traditional SQL data warehouse.

Wednesday, August 06, 2025

Everything You Need to Know About On-Premise SQL Server Mirroring in Microsoft Fabric

Microsoft Fabric is rapidly changing the data landscape, bringing together analytics, data science, and business intelligence into a single, unified platform. One of its most exciting features for organisations with existing on-premises SQL Server deployments is Mirroring. This capability offers a near real-time, low-friction way to bring your transactional data into the Fabric ecosystem for immediate analytics, reporting, and AI workloads.

At the time of writing this post, this feature is still in Preview. This means this feature cannot be used in Production scenarios.

But what exactly is it, how does it work, and what should you be aware of before diving in?

Let's break it down.

What is SQL Server Mirroring in Microsoft Fabric?

At its core, SQL Server Mirroring in Fabric provides a near real-time, low-latency replication mechanism for your on-premises SQL Server databases directly into Microsoft OneLake, Fabric's unified data lake. It creates a "Mirrored Database" item in your Fabric workspace, which then exposes a SQL Analytics Endpoint and a Default Semantic Model.

This means your live transactional data is instantly available for the below use cases.

Querying with T-SQL using the SQL Analytics Endpoint.
Building Power BI reports with Direct Lake mode for unparalleled performance.
Leveraging Spark notebooks for advanced analytics and transformations.
Creating operational dashboards that reflect the latest business transactions.

The magic behind this is that Fabric automatically uses Change Data Capture (CDC) for SQL Server 2016-2022 or a native change feed for SQL Server 2025 to intelligently track and synchronise changes, avoiding the need for complex ETL pipelines for initial data ingestion.

The Power of Mirroring: Key Benefits

You can say goodbye to nightly batch jobs. Mirroring ensures your analytics are always based on the freshest possible data.
No more manual pipeline creation for initial loads and incremental updates. Fabric handles the replication automatically.
Your SQL Server data lands directly in OneLake, making it accessible across all Fabric personas (Data Engineering, Data Science, Data Warehousing, BI).
The actual data replication process itself (the initial snapshot and ongoing CDC) does not consume your Fabric Capacity Units (CU). You only pay for the compute when you query or process the data in Fabric.
For supported tables, Power BI datasets automatically leverage Direct Lake, offering the performance of Import Mode with the freshness of DirectQuery.
The replication is designed to be low impact on your production SQL Server, using CDC mechanisms that are highly optimised.

What You Need to Know: Requirements & Limitations

While incredibly powerful, mirroring isn't a silver bullet, understanding its current requirements and limitations is crucial for a successful implementation.

On-Premises SQL Server Requirements -

SQL Server 2016 through 2022 (using CDC) and SQL Server 2025 (using native change feed).
Mirroring is supported only on the primary replica of an Always On Availability Group.
SQL Server on Windows is fully supported. SQL Server on Linux (2017 CU18+ to 2022) is also supported.
A Microsoft Data Gateway (Standard or VNET) is required for secure communication between Fabric and your on-premises SQL Server.

Database & Table Limitations:

Single Mirror Target: A database can only be mirrored to one Fabric workspace at a time.
No Redundant CDC (SQL Server 2025): If you're on SQL Server 2025, do not manually enable CDC on the source database; Fabric handles this for you.
No Delayed Transaction Durability: Databases with this setting cannot be mirrored.
500 Table Limit: Each mirrored database can replicate a maximum of 500 tables. Fabric will pick the first 500 alphabetically if you choose "Mirror all data."
Unsupported Data Types: This is a big one! Tables containing columns with types like image, text, ntext, xml, timestamp/rowversion, sql_variant, geometry, geography, hierarchyid, json, vector, or User-Defined Types (UDT) cannot be fully mirrored. The presence of these types will cause the entire column, or sometimes the whole table, to be excluded.
Precision Loss: datetime2(7) and datetimeoffset(7) columns will lose their 7th digit of precision (becomes 6 digits) when mirrored to OneLake's Delta Lake format.
No Row/Column Filtering: Mirroring is an "all or nothing" proposition for selected tables. You cannot filter specific rows or exclude individual columns during replication.
DDL Changes: Significant schema changes (like altering a Primary Key) will often trigger a full re-sync of the entire table, leading to higher latency for that specific table until the re-sync completes.

Measuring Capacity Usage: Understanding the Costs

Replication is FREE (Compute-wise) -- The initial snapshot and continuous Change Data Capture (CDC) processing itself does not consume your Fabric Capacity Units (CU). This is a significant advantage over other replication methods.
You Pay for Data Usage -- Your Fabric Capacity Units are consumed when you -

Query the mirrored data via the SQL Analytics Endpoint.
Interact with Power BI reports built on the mirrored data (especially in Direct Lake mode).
Process the data using Spark notebooks or other Fabric experiences.

Storage Costs -- You get a generous 1 TB of free mirroring storage for every 1 Capacity Unit (CU) you own. For example, an F64 capacity gets 64 TB of free storage for mirrored data. You only start paying for OneLake storage if you exceed this allowance or if your Fabric Capacity is paused (storage charges still apply).

How to Monitor ?

Use the Microsoft Fabric Capacity Metrics App to track your CU consumption and storage usage for "Mirrored Database" items. This app will clearly show you what's driving your costs. You can also leverage Workspace Monitoring (via KQL Databases) to track MirroredDatabaseTableExecution logs for metrics like ProcessedBytes, ProcessedRows, and Latency to understand the operational health and volume of your replication.

Prerequisites to Getting Started with Mirroring

Verify your SQL Server version, Always On Availability Group configuration, and install a Data Gateway if needed.
Fabric will automatically enable CDC on your source database (or use the native change feed for SQL Server 2025). Ensure your SQL Server instance has the necessary permissions.
In your Fabric workspace, select "New > More Options > Mirrored Database (SQL Server)."
Provide your SQL Server connection details and gateway information.
Choose which tables you want to mirror. Fabric will automatically identify and flag any tables with unsupported data types.
Keep an eye on the replication status in Fabric and use the Capacity Metrics App to track usage.

Conclusion

On-premises SQL Server Mirroring in Microsoft Fabric is a game-changer for organisations looking to bridge the gap between their transactional systems and modern analytics. By providing near real-time, low-cost data ingestion directly into OneLake, it unlocks a world of possibilities for faster insights and more agile data operations. While you need to be mindful of its current limitations, the benefits of quick, seamless integration make it an invaluable tool in your Fabric journey.

Tuesday, August 19, 2025

The Power of Lakehouses and Warehouses in Microsoft Fabric

Wednesday, August 06, 2025

Everything You Need to Know About On-Premise SQL Server Mirroring in Microsoft Fabric

Medallion Architecture in Fabric