In this blog post I’ll talk about alternatives to achieve Workspace Level Refresh in Power BI and Fabric, by using the new available item Data Pipelines or an approach that uses Power Automate to achieve this goal.
From time to time Microsoft announces its intents to new features they’re working on and expected timelines they try to deliver it. I highlight the word try, because when Release 2021 wave 1 was announced, I got really really excited about the Workspace level refresh for datasets and dataflows as you can see here.
Unfortunately, this feature just disappeared out of thin air, but in a Guy in a Cube live stream (I can’t remember the date, but was by the end of 1st semester of 2021), Matthew Roche actually told that they thought they were able to deliver it, but just couldn’t. After that every new release wave emerged, the first thing I’ve searched for was this feature that never came back. Maybe they just knew Fabric was only a couple years to be announced…
The fact is that now with Fabric we have the ultimate orchestration tool: The Data Pipeline item. But between Fabric was released and the only real alternative was to use Power Automate to do so.
If you have Fabric available, just jump to the end of this blog post and see the Data Pipeline’s approach. If not, you can still use some of the techniques we had to do in the old days.
The first attempt to achieve this was the chaining logic behind dataflows
Note: This applies only to Dataflows Gen1. Datafows Gen2, since it’s a Fabric only item, I didn’t observe this following chaining behavior.
As a Premium only feature, we have the ability to refer to linked entities in downstream dataflows gen1 from the same workspace. This produces a beautiful sequence in the lineage view, but also have a interesting effect: internal orchestration of the dataflows refreshes.
Everytime you hit refresh button from an upstream Dataflow, let’s say Dataflow A1, every other dataflow looking forward (B and C) will be refreshed in a sequence.
This sounds great, but have some caveats:
i) If you have some chain that has two sources for the same dataflow, you cannot choose the refresh order.
The chain only looks forward, meaning if you hit refresh to dataflow A1, it will block dataflows B and C to being refreshed and only unblocks when the whole chain completes the refresh. There is no way to say in Power BI Service: Just refresh A1 and A2 in parallel and then refresh B then C. If you try so, you’ll get this error message:
Cannot acquire lock for model ‘WorkspaceID/DataflowID’ because it is currently in use.
So, you need to first refresh the first chain A1 -> B -> C and then, when the first chain ends, refresh the second chain A2 -> B -> C.
ii) It only refreshes the data that was changed by the linked entity.
This means it has some optimizations and the overall refresh time will be faster, but at the same time, if C has an external source, let’s say, some Web source, the chain A1 -> B -> C, when C starts refreshing, depending on how you build your queries, it might not connect to the Web source, only refreshing the internal dataflow C’s queries that depends on the B’s linked data. Sometimes it’s desirable, sometimes it isn’t.
iii) If some of the Dataflows (either A1, B or C) refresh fails, there is a roll back feature that simply doesn’t update none of the dataflows, even if A1 and B were successfully updated. This is both excellent and terrible. Excellent, because most of the times when something goes wrong you actually want your data to roll back to some previous stable stage, but on the other hand it’s a nightmare to debug the error. Let’s imagine the error was in Dataflow C. Since the data of A1 and B was actually rolled back, it’s hard to reproduce the scenario so you can try to fix the problem.
To debug, the best solution is to actually break the link between the dataflow that failed ant its upstream dataflow, refresh the ones that were successful the get the most up to date data and then troubleshoot the problematic one with the actual data it had an error in the first place. The two main alternatives to breaking the links are:
a) Disable load to the linked query
b) Change the way the linked query is accessed, for example, by using parameters.
Instead of writing something like this:
You can write save the GUIDs to parameters and rewrite the queries like this:
We still have a problem. This only solves the Dataflows part of the workspace level refresh. In Power BI we serve data in Datasets through reports, right?
To achieve this last step I’ve seen innumerous people having it’s last dataflow to enhanced compute engine, so it could be used as a direct query source for Power BI Datasets. Please, just don’t do this. You’re not following the best practice of using import and even worse, depending on the way your model is set and your DAX expressions are calculated, inserting a fact table from another data source actually introduces limited relationships that might mess up with previous validated measures. On top of that it’s a sure thing that you will experience worse performance.
Using Power Automate to achieve Workspace Level Refresh
What I didn’t know was there already existed a Power Automate alternative to achieve the same goal announced here which you could use to do the orchestration of the workspace.
We now have the possibility to use a Power BI Dataflows trigger and action. Meaning that I could in a Power Automate flow start a refresh for a dataflow and detect when the refresh finished.
This way you could not only know when a refresh is successful or fails, but also nest several flows together and achieve the workspace level refresh you actually needed. I’d like to thank my colleague Luiz for recently enlightening me that we could achieve it in a SINGLE flow. My ignorance in Power Automate didn’t let me add a trigger in the middle of a flow. So, the pattern you should build using Power Automate, for each flow is:
i) First create a new flow that can be initiated from a Power BI Desktop as you can see here and group this Power Automate within a Scope
ii) Save the scope to clipboard
iii) Paste the template scope to the yes branch as many steps as necessary
In the end, you’ll have a chained set of actions to initiate the refresh as well as the trigger that waits for the refresh to complete. It’s important to note that a complete refresh doesn’t mean it’s succeeded. That’s why we have control and a teams message to alert someone in case it fails.
As last step, way better than user Power BI in Direct Query against your last dataflow, is to start a Power BI Dataset refresh action in Power Automate. The problem is that you can only start a refresh, not knowing when it ends nor if it ends with success or not.
The best workaround to know when the dataset refresh ended was described by Imke here.
The state of the art to the workspace level orchestration: Fabric Data Pipelines
Whether you’re just starting to use Fabric with some existent Power BI workload or using Fabric on it’s full capacity, Data Pipeline is you main orchestrator.
If you already have your solution built on Dataflows Gen1 and imported datasets, you can use Data Pipelines to choose the order of your refresh. Currently we don’t have a refresh dataset specific activity yet, but I’ve heard here that it will be available soon. Meanwhile you can follow some possible, but more complicated pattern here.
Every box is called an activity. By the end of the activity you have 4 possibilities:
on skip in case the activity wasn’t evaluated
on success in case the activity was succeeded
on fail in case the activity was failed
on completion in case the activity was not skipped (therefore it was either succeeded or failed)
So It’s easy to click drag and drop and build your orchestration logic.
Unfortunately, at the time of this writing, the refresh dataset activity isn’t ready yet, but it was demonstrated at Microsoft Power Platform Conference 2023, so it’s coming soon. You can actually leverage the Enhanced refresh with the Power BI REST API to do so by using the web activity.
Data Pipelines are way more powerful than just orchestration, which is the focus of this blog post, and it can also let you use parameters, read metadata to design a really flexible orchestration pipelines and also have a state of the art copy activity that will be added to Dataflow Gen2 soon (please just don’t disappear on me again).
So, in this blog post I give the best alternatives to do a workspace level refresh using Power BI Premium, Power BI + Power Automate and Fabric approaches.