Keeping a high volume of data organized is one of the top priorities for a business analyst. You’ve likely heard of (or may already be using) Medallion architecture to achieve these goals. This structure is also considered best practice while designing and using Microsoft Fabric.
I’ve curated an overview of the Medallion Architecture from Microsoft’s resources to help explain the concepts at a high level. Links to the full documentation will be included at the end of this article.
If you have further questions about Medallion architecture or Microsoft Fabric, please contact Superior Consulting Services.
Medallion architecture is a design pattern that's used by organizations to logically organize data. It is comprised of three zones or layers. Each layer indicates the quality of data, with higher levels representing higher quality. This multi-layered approach helps you to build a single source of truth for enterprise data products.
There are three layers that make up the core of this architecture:
At the foundation Of Microsoft Fabric is the OneLake data lake. It comes automatically provisioned with every Fabric tenant, and it's designed to be the single location for all your analytics data. Using OneLake allows us to remove silos and reduce management effort. All organizational data is stored, managed, and secured within one data lake resource.
Benefits include:
The overall medallian structure within OneLake is represented in this graphic. Below, I’ll go through each of the layers in detail.
The bronze layer contains unvalidated data. Data ingested in the bronze layer typically:
Retaining the full, unprocessed history of each dataset in an efficient storage format provides the ability to recreate any state of a given data system.
Additional metadata (such as source file names or recording the time data was processed) may be added to data on ingest for enhanced discoverability, description of the state of the source dataset, and optimized performance in downstream applications.
Recall that while the bronze layer contains the entire data history in a nearly raw state, the silver layer represents a validated, enriched version of our data that can be trusted for downstream analytics.
While Databricks believes strongly in the lakehouse vision driven by bronze, silver, and gold tables, simply implementing a silver layer efficiently will immediately unlock many of the potential benefits of the lakehouse.
For any data pipeline, the silver layer may contain more than one table. The number of required tables will depend on the necessary data.
This gold data is often highly refined and aggregated, containing data that powers analytics, machine learning, and production applications. While all tables in the lakehouse should serve an important purpose, gold tables represent data that has been transformed into knowledge, rather than just information.
Analysts largely rely on gold tables for their core responsibilities, and data shared with a customer would rarely be stored outside this level.
Updates to these tables are completed as part of regularly scheduled production workloads, which helps control costs and allows service level agreements (SLAs) for data freshness to be established.
While the lakehouse doesn’t have the same deadlock issues that you may encounter in an enterprise data warehouse, gold tables are often stored in a separate storage container to help avoid cloud limits on data requests.
In general, becuse aggregations, joins, and filtering are handled before data is written to the gold layer, users should see low latency query performance on data in gold tables.
Medallion architecture is a proven structure for organizing and validating data. Using the included OneLake data lake within Microsoft Fabric, business analysts can flow data through a three layer architecture to distill raw data into knowledge.
For a more detailed plan regarding your databases, contact a data analyst at SCS to discuss your project.
Further Reading:
What is the medallion lakehouse architecture? - Microsoft Build, 03/01/24
Implement medallion lakehouse architecture in Microsoft Fabric - Microsoft Build, 11/15/23
Superior Consulting Services (SCS) is a Microsoft-centric technology firm providing innovative solutions that enable our clients to solve business problems. We offer full-scale data modeling, analytics, and custom app development.