2 mins read

DirectQuery optimization – Connecting to Data Sources

DirectQuery optimization

When using DirectQuery, it’s important to remember that the total performance experienced is dependent upon both Power BI and the underlying data source. Many times, Power BI reports will use a data warehouse such as Azure Synapse Analytics as the data source and for enterprise data warehouses like this, there are several considerations to keep in mind when optimizing the performance of DirectQuery. Some or all of these techniques may not always be possible, but they are often worth considering. They are listed here in the order you should consider them:

  1. Indexes – Relational databases support the building of indexes. Indexes allow the efficient query and retrieval of data at the cost of storage to maintain the index data structure. As data storage has continually decreased in recent years, adding indexes to database tables and views often becomes advantageous to help decrease the time needed to return data with a query.
  2. Dimension-to-fact table key integrity – When fact and dimension tables are used in a data warehouse it’s important that dimension tables contain a key that provides proper data cardinality. This means that records in the dimension table can be matched (or joined) to records in the fact tables(s) without causing a Cartesian product – which is generally something to avoid for both storage and performance reasons.
  3. Materialize data and dates – When possible, materializing calculated data and date tables in the source database can help solve performance in DirectQuery scenarios because it reduces the need to perform these calculations downstream in Power BI.
  4. Distributed tables – For Massively Parallel Processing (MPP) data warehouses, it’s often worth considering reviewing the query performance of the distributed tables stored in the data warehouse. MPP databases store data and query over multiple compute nodes in a cluster, and database tables are typically organized in such a way that the storage of data on compute nodes is distributed in a manner that makes joins efficient and performant for queries. It is possible that distributed tables in the MPP database aren’t set up for efficient queries and it would be beneficial to consider how the tables are distributed by changing query patterns or changing the distributed table setup.

Next, let’s look at optimizing composite models.

Leave a Reply

Your email address will not be published. Required fields are marked *