Power BI datasets – Connecting to Data Sources
Power BI datasets
Data used in Power BI reports uses a Power BI dataset. Power BI datasets are basically a connection string and credentials that Power BI uses to connect to a data source. Every report that is created in Power BI must have a dataset associated with it. This ensures the report has data to visualize.
Power BI datasets are often created using the Power BI Desktop tool. At the time a report is published, both the report (containing visuals) and the dataset are published to the Power BI service. It is important to note that credentials are not sent to the Power BI service and must be re-entered upon publishing – this is useful to allow report developers to create reports locally using their own credentials (often with development data sources) but then change the connection string and credentials after reports are published (to production data sources).
Datasets can be used or shared among reports or can come from uploading Excel files to the Power BI service directly or come from push or streaming sources as well. For example, the Azure Stream Analytics service provides the capability of streaming data directly to a Power BI service streaming dataset for use with dashboards created on the Power BI service.
Power BI dataflows
As data needs change, Power BI continues to adapt to those needs to help organizations understand and use their data to meet continuously changing market demands. Dataflows are a capability in Power BI that empowers Power BI users to perform self-service data preparation, which is sometimes a necessary component of the end-to-end reporting and analytics solution.
Figure 2.3 – Visual representation of how Power BI dataflows work with other Power BI assets and content
Like other data transformation tools, Power BI dataflows connect to disparate data sources and then perform a transformation on the data before it gets used as a Power BI dataset. This is useful when an organization wants to prevent report developers from needing to access the underlying data sources directly (potentially for security or performance reasons). Dataflows also enable the capability of optimizing data transformation that might always need to take place in order to make data usable in reports. For example, if database data columns always need to be renamed and joined with data in another table in order to be used by report creators, then doing it once in a dataflow might be an optimized way to do this rather than needing report creators to do this each time (and possibly different ways each time).
Dataflows also provide a place where data can uniformly be transformed and made ready for Power BI report consumption from other Power Platform or Azure services. Using the Microsoft Azure cloud backbone, services such as Microsoft Dataverse and Dynamics 365 are popular sources for use with dataflows but other cloud services also apply.
While dataflows can be used by Power BI Pro users, some features require Premium Per User (PPU) or Premium capacity. These are features such as DirectQuery from data sources, incremental refresh, computed and linked entities, and the enhanced compute engine.