Query performance tuning – Connecting to Data Sources
Query performance tuning
When considering optimizing the performance of Power BI queries, there are a few places to start, and the recommendation is to look at each as a layer of performance tuning to undertake. These layers include reducing the size of the data, optimizing DirectQuery (when used), and optimizing composite models (when used).
Reducing the data size
This technique is the idea of limiting the amount of data that Power BI needs to work with to only what is needed for the report. For example, if the sales database being used as the source for the report contains sales data for 70 countries but there is only a need to report sales for 5 countries, then it makes sense to only use data for the 5 needed countries. This can be accomplished when connecting to the source database and using a WHERE clause that limits the data to only the rows for the needed countries.
While this technique works well with rows of data, it can be applied to columns also. It’s not uncommon for enterprise databases to have tables that are very wide or contain many columns. Depending on the reporting use case, many columns will not be needed and can be left out of the Power BI data model. This will further reduce the data size. In this case, only the necessary columns can be selected by using the SELECT clause of a custom query when connecting to the data source. It’s possible to write a custom query selecting only the rows and columns needed for the report by clicking Advanced options when connecting to the data source in Power BI. Keep in mind that not every data source will support limiting the data imported this way. Some sources may require some prefiltering to limit the dataset to only the needed data before connecting to it from Power BI.
Depending on the use cases, it’s also possible to do precalculations or summarization outside of Power BI before the data is loaded into Power BI. In cases where preprocessing or filtering is needed, it’s also a worthwhile exercise to see whether there are any calculations or pre-aggregations that can take place on the data before it gets used by Power BI. This has the possibility of greatly reducing data sizes but there may be trade-offs depending on the scenario. Depending on the situation, doing pre-aggregation may not always be possible or preferred.
For example, if the sales database contains detailed transaction-level data (think every single item and price for every single customer purchase), if the retail store had five different sales on the same day to different customers and each one of them purchased a bottle of soda, then the sales detail table might contain five rows of data to record each sale of soda to each customer on that day. The report might not need data at that granular level but instead, it might be fine to just have a sum of all the sales for that soda for that day, which would be one row of data. In this case, we could pre-aggregate this data using a view in the database or in a custom SQL query used in Power BI when we connect to the sales database. Both methods would result in fewer rows of data in Power BI while still meeting the business requirements of the report.
In review, some of the important techniques to remember for reducing data size include the following:
- Reducing to only the rows and columns of data required
- Completing precalculation and summarization prior to use in Power BI
Next, let’s look at optimizing DirectQuery data sources.