Microsoft Fabric is designed to integrate multiple analytics workloads, including data engineering, data science, and business intelligence, within a unified platform. While our current Medallion structure already has a logical progression of data refinement (Bronze → Silver → Gold), there are key differences in architecture, tooling, and approach that we might consider while emulating a Fabric-ready structure:
Comparison Between Medallion and Fabric Layout
Aspect | Medallion Structure | Microsoft Fabric Structure |
---|---|---|
Bronze Layer | Raw data landing in containers (often flat files, such as CSV or JSON). | Lakehouse raw zone: Raw data is stored in OneLake, Fabric’s unified data lake, in Delta format for optimised analytics. |
Silver Layer | Parquet-based curated data in a data lake. | Lakehouse refined zone: Curated and enriched data stored in Delta format, ready for more direct query workloads. |
Gold Layer | SQL queries executed on demand for Power BI views. | Warehouse: Fully optimised data warehouse tables, directly accessible by Power BI and other services. |
File Formats | Parquet (Silver). | Delta format is Fabric’s primary choice, supporting versioning, ACID compliance, and real-time updates. |
Query Mechanism | SQL-on-demand through get.myview . | SQL queries supported across all Fabric workloads, tightly integrated with the Warehouse, Lakehouse, and Power BI datasets. |
Data Storage | Hierarchical Azure Storage Containers. | All data centrally stored in OneLake, which provides a single namespace for data assets across services. |
Integration with BI | Power BI views depend on external SQL queries (Gold). | Power BI deeply integrated with Fabric, allowing live connections to data in the Warehouse or Lakehouse. |
Data Engineering | Data flows typically managed externally (e.g., ADF pipelines). | Built-in pipelines for dataflows, Spark notebooks, and transformations within Fabric’s unified engineering framework. |
Governance | Separate tools (myBMT) for metadata management, governance, and lineage tracking. | Microsoft Purview integration for lineage, governance, and unified cataloguing within Fabric. |
Steps to Emulate a Fabric-Ready Structure
1. Adopt Delta Format
- Convert Parquet files in Silver and Gold layers to Delta format to align with Fabric’s use of Delta Lake. Delta supports version control, ACID transactions, and faster updates, which are key for Fabric compatibility.
2. Unify Storage
- If possible, mimic OneLake by creating a unified namespace or using a centralised storage account with logical zones for raw, refined, and curated data.
- Maintain directory structures similar to Fabric zones (e.g.,
Raw
,Refined
,Enriched
).
3. Optimise for Direct Query
- Gold layer SQL queries should be optimised for direct query performance in Power BI. Fabric’s Warehouse structure enables live connections, so ensure query performance is consistent with such expectations.
4. Incorporate Governance & Metadata
- Fabric’s tight integration with Purview suggests prioritising data governance and lineage. Implement robust metadata tracking and lineage tools for our Medallion structure.
5. Integrate with Power BI
get.myview
function works similarly to Fabric’s integrated access. Maintain or enhance this functionality to allow seamless switching to Fabric’s direct integration in the future.
6. Experiment with Spark & Pipelines
- Fabric offers Spark notebooks and Data Factory-like pipelines natively. If we rely on Azure Data Factory, consider how Fabric’s pipelines could replicate or improve this functionality.
7. Track Costs & Scalability
- Fabric emphasises scalability with a focus on reduced operational overhead through its unified platform. Align our container usage and data processing workflows with this principle to ensure smooth migration.
Benefits of Emulating Fabric Layout Now
- Future-Ready: Aligning with Fabric principles ensures minimal disruption when we fully transition.
- Improved Performance: Delta format and better query optimisation can enhance overall performance today.
- Unified Data Governance: Pre-emptively strengthening metadata and lineage tracking aligns with Fabric’s integrated governance model.
- Cost Management: Streamlining storage and processing workflows in preparation for Fabric can lead to savings and increased operational efficiency.