In the self-service part of the BI from microsoft, the only ETL tool available for regular business users is Power Query via Power BI or Excel. It is a no/low-code interface with hundreds of built-in connectors to easily extract and transform your data.
While there are several versions of Power Query, each with its own particularities in how the engine actually executes the M scripts (such as in Excel, Power BI Desktop, Power BI Service/Fabric for import models, Power BI Dataflow Gen1, Power BI Dataflow Gen2 (DFGen2), Power Platform Dataflow, and others that I’m likely forgetting), the most popular one — and the one where you probably already have dozens of built ETL logics — is via Power BI Desktop, which is then published to the Service/Fabric.
Since it’s bundled with the Semantic Model, it can also be enhanced with calculated DAX columns and tables if desired (although in most cases, this is not recommended). The materialization of calculation tables might be important when using the Semantic Model to conduct the formal/official evaluation of an indicator, which might be used as input for other processes.
Since a Semantic Model is run by the Analysis Services engine, which is an in-memory engine, it’s unreasonable to have every single semantic model in memory at all times. That’s why there is a model eviction logic to save that imported data somewhere, so the next time someone tries to connect to the semantic model, it can be quickly loaded back into memory, aka transcoding. Before Fabric, this stored data used a proprietary file storage format, but since the standard way to store tables in Microsoft Fabric is via delta tables, and the Parquet file format is almost compatible with it, the adjustments were made to enable reading a delta table and quickly loading it into Analysis Services memory.
So, in other words, OneLake integration for semantic models is a way to make those evicted files accessible in a place where they can be easily consumed by all other Fabric and non-Fabric engines that can read Delta.
The Business User’s Reality
When you’re from the business side, most of the time you either don’t have access to a proper data warehouse, or your source is only available through DFGen1, an import semantic model (whose DAX queries can be ingested into your Power Query like here), or even with encrypted Excel files via sensitivity labels.
To properly ingest this data into your Fabric OneLake, there are new alternatives in Fabric, mainly by using the Copy Job (formerly the copy activity from Data Factory’s data pipeline), Python via notebooks (especially using the awesome semantic-link-labs), or DFGen2, where you set a OneLake destination for each table.
The unintuitive option is actually to use OneLake integration for the semantic model to do this extraction to the bronze layer. And it comes with many benefits over the previous alternatives:
- It uses the most resilient and lightweight Power Query engine: the Semantic Model/Power BI Desktop one.
- You can access Excel files encrypted via sensitivity labels.
- It is compatible with SharePoint-hosted files (usually the files section of an MS Teams team).
- You can develop offline using Power BI Desktop.
- It is fully compatible with Git integration as of October 2024.
- You can set up incremental refresh logic with the most flexible implementation available.
- Every compute is categorized as background, which benefits from 24-hour smoothing.
You can find more details on how to set this up here, here, and here. But mainly, you need to have Fabric enabled on your capacity, set your semantic model to the large semantic model storage format, and enable it from the semantic model settings. The next time you refresh it, the data from the import tables of your semantic models will automatically be available for OneLake usage.
You can then have a OneLake shortcut on your Bronze Lakehouse pointing to your OneLake-integrated semantic model and start consuming it using any engine you desire.
The only point of attention is that with each refresh, the old versions of your delta table are deleted, meaning you only have the most up-to-date version stored. This might cause transient issues for engines such as a DirectLake semantic model with automatic updates on.
Recent Comments