Composite model optimization – Connecting to Data Sources
Composite model optimization
When something is composite, it is made up of more than one thing. Composite models in Power BI mean that a single data model contains data from multiple sources, and these can be both Import and DirectQuery. Note that composite models can’t be used for every data source type, and they introduce complexity to the data model that needs to be managed. If possible, consider not using composite models, for the sake of reducing complexity and security. Since data from imported datasets can be used to cross-filter data in DirectQuery datasets, queries that push down to the DirectQuery datasets could contain sensitive information, and since relational databases usually log all queries that are processed, that means it’s possible sensitive information that was imported will now end up in the query logs of the relational database used in the DirectQuery dataset.
In general, as stated previously, it’s best to start with Import Query for data models and then only consider DirectQuery if there are very large volumes of data or there is a need for up-to-the-second reporting of a supported data source. Switching to a composite model allows the following:
- Using DirectQuery with multiple DirectQuery-supported data sources
- Using DirectQuery with additional data imported into the model
- Boosting the performance of a DirectQuery data model by importing selected tables into storage and optimizing the query operations
In order to optimize a composite model, there are two main areas to consider: table storage and the use of aggregations.
Table storage
When a composite model is enabled, each table can be configured in one of three different modes:
- DirectQuery – This mode functions like DirectQuery always has: no data is stored in the Power BI model and data is always queried back to the underlying data source when needed by Power BI. This is best used when data needs to be the most up to date from the source or when data volumes are very large and unable to be imported. It’s common that large fact tables will be best set to DirectQuery mode.
- Import – This mode functions like Import always has: data is copied into the in-memory storage in the Power BI data model. It’s sometimes possible for dimension tables to be set to Import mode to help increase performance when grouping and filtering. When data is stored in memory in the model, we’ll see the highest performance in Power BI.
- Dual – This mode allows tables to behave like both Import and DirectQuery. This is useful when it’s possible that a cross-filter query or slice of data in a visual will generate a query that pushes down to the same DirectQuery source for both tables. Since Dual allows data to be stored in memory in the data model, it’s possible this will aid performance.
Aggregations
Composite models support storing DirectQuery tables as Import tables. They also add the ability for data to be aggregated when it is stored in the data model. Storing aggregated data in memory in the Power BI data model will increase query performance over very large DirectQuery datasets. The rule of thumb for aggregation tables is that the underlying source table should be no less than 10x larger. For example, if the underlying data warehouse table contains 5 million records, then the aggregation table should contain no more than 500,000 records. If it contains more than 500,000 records, then it’s likely the trade-off for creating and managing the aggregation table will not be worth the minimal performance gain compared to using the existing DirectQuery table. However, if the aggregation table results in 500,000 or fewer records, then it will likely show great performance gains and will be worth the creation and management of the aggregation table.