Checklist for Finalizing a Data Model in Power BI Desktop
My colleague and friend Meagan Longoria has been giving me a few “gentle nudges” to update an old post about this topic:
This post is a checklist of suggested items to be completed in your Power BI data model, within Power BI Desktop, before you consider it “done done.” This is not a how-to post; it's meant to truly be a checklist with just a bit of context in each 'why' column. (Though I concede I got more than a little wordy at 14 pages.)
A downloadable version: Checklist
Checklist last updated: 12/27/2017 (V2)
The checklist focuses on the data model itself, rather than report design. For sound practices and suggestions about report design, check out Meagan's series: Design Concepts for Better Power BI Reports.
This checklist does **NOT** include:
- Visuals or reports in Power BI Desktop
- Performance-related specifics (because it always depends)
- Testing specifics or approaches
- After publishing to either Power BI Service or Power BI Report Server
- Ongoing administration and management such as data refresh
Terminology:
- Data model = the data imported to Power BI; also known as the dataset (formerly known as Power Pivot)
- PBIX = the Power BI Desktop file
- Field = columns and derived measures which are displayed in the field list
- Query = source query in the Query Editor (formerly known as Power Query)
Why Is the Power BI Data Model So Important?
There are many things you can do with report visuals – change field names, adjust formats, and lots more. However, adjustments to things like names and formats within the report should be the exception rather than the norm. If all fine-tuning is done within the data model, it’s done *once* and every downstream report takes advantage of it. You will save yourself much time and effort, and improve consistency from report to report tremendously. Other users who consume your data model through reports and/or Q&A will appreciate the extra effort you’ve made to improve the user experience.
PBIX File:
To Do |
Why |
Use a Consistent, Unchanging PBIX File Name
with Business Relevance |
The PBIX name you assign is what the user
will see after it’s been published. The name for the PBIX file should
describe the set of reports, and it should be a consistent name which does
not change over time. While a version number (like “SalesAnalysis-V1-December)
is ok to keep straight interim copies of the PBIX, the file name needs to
remain consistent once it is published. This is to ensure favorites, links,
and apps all continue to work whenever changes are released. |
Consider Reusing an Existing Dataset |
Before starting a new PBIX, give some
consideration to whether the data already exists in another Power BI data
model. If new reports are the goal, yet the data already exists, a “hub and
spoke” approach may work in which an existing published data model is reused.
This can be done by creating new reports in the Power BI Service that refer
to an existing dataset, by using a Power BI Service Live Connection, or via
Analyze in Excel functionality. A hub and spoke approach (a) decreases the
amount of redundant data stored in various PBIX files, (b) reduces the number
of data refresh operations, (c) reduces future maintenance needs, (d) improves
consistency because there are fewer files with data models, and (e) facilitates
the separation of dataset development from report development. More details: Reusing Datasets Imported to the Power BI Service. |
Use the Correct Release of Power BI Desktop
Based on Your Target Deployment Destination |
There are now two ‘flavors’ of Power BI
Desktop. The standard one is updated monthly, and it correlates with functionality releases
in the Power BI Service. The newer one is Power BI Desktop Optimized for Power BI
Report Server – that one is
updated every 3-4 months. If you develop Power BI solutions which are deployed
to both the Power BI Service and to Power BI Report Server, you will need
to carefully manage which release you are using for your data model
development (they can run side by side on the same machine). Keep in mind
that if collaboration will occur on a Power BI Desktop file, all users need
to have the same version of Power BI Desktop installed on their machine (this
is easier if you install the standard version of Power BI Desktop via the Windows Store so that it will update itself in the background). |
Manage the Dataset Size |
File size limits impact the amount of data
which can be imported into the data model. At the time of this writing, the
file size limits (after compression is applied) are as follows: |
|
Power BI Desktop – based on individual
PC system capability Power BI Service – 1GB, or 10GB if PBIX
is deployed to a Premium workspace Power BI Report Server – 2GB if PBIX is
deployed to PBIRS |
Utilize Version History for PBIX File |
Where to store the original PBIX file is an
important decision. One common location is OneDrive for Business, which has
built-in versioning available, so you can go back to a prior version if
something goes wrong. A source control repository is also ideal. The original
file should reside in a team site, not an individual user’s laptop. If report
edits are made directly in the Power BI Service, you may also need to download
from the Service and synchronize with the version in your version history. |
Consider Using a Power BI Template |
A Power BI template (PBIT) can contain
anything from a PBIX, except the data itself. (Note this is different from
solution templates available on AppSource.) Templates can contain common
queries and data tables (such as a Date table in every model). The original
template file should reside in the same original location with version
history. |
Consider Using a Power BI Theme |
A Power BI report theme allows for a consistent use of a color palette in a report.
Though theming affects reports and not the data model (the data model is the
focus of this checklist), it is possible a data modeler will handle this task
since custom report themes are based on JSON. The original JSON file should
reside in the same original location with version history. |
PBIX File Properties:
To Do |
Where |
Why |
Decide if Using Auto Time Intelligence |
File > Options and Settings > Options
> Current File: Auto Date Time |
Auto time intelligence is enabled by
default, and it applies to each individual PBIX file (there’s not a global
option). For most datetime columns that exist in the dataset, a hidden date table is created in the model to support time-oriented DAX
calculations. This is great functionality for newer users, or if you have a
very simple data model. However, if you typically utilize a standard Date table
for the analytical dates, then you may want to disable the hidden date tables
to reduce the file size – particularly if you have some dates which are just
attributes (most dates can be analytical wherein you do analysis like YoY,
though some dates can just be simple attributes which don't need to be
connected to a Date dimension). Using the built-in functionality along with a
user-generated Date table can work as well. (Tip: You can view the hidden
date tables if you connect to the PBIX via DAX Studio.) |
Control Options for Relationships |
File > Options and Settings > Options
> Current File: Relationships |
Once the data model and relationships are
complete, you may wish to ensure the relationships do not change without your
knowledge. To do this, disable two items in each PBIX file (there’s not a
global option): “Update relationships when refreshing queries” and
“Autodetect new relationships after data is loaded.” |
Use Preview Features Carefully |
File > Options and Settings > Options
> Global: Preview Features |
Enabling public preview features should be
used cautiously since preview features are not typically supported in the
Power BI Service nor Power BI Report Server. It’s easy to forget you used a
preview feature which won’t work properly after the report has been published. |
Query Editor:
To Do |
Where |
Why |
Decide the Data Connectivity Mode |
Home > Edit Queries |
A very important decision is whether to
utilize import mode or DirectQuery mode. Guidance about this decision can be
found in the Planning a Power BI Enterprise Deployment whitepaper. |
Import the Minimum # of Columns |
Home > Edit Queries |
Importing only the columns that are needed
simplifies the model and reduces the size. Model size is very important since
the Power BI model resides in-memory (when in import mode). Since hidden
columns still consume size and memory, the rule of thumb is: if a column doesn’t
contribute to a report, calculation, or a relationship, then it shouldn’t be
imported to the model. |
Disable Load for Intermediary Queries |
Home > Edit Queries > All Properties
> Enable Load |
Intermediary queries in the Query Editor are
very useful in certain circumstances, such as when two queries are then
merged together into one. Any intermediary queries should be disabled from
being loaded to the data model. |
Parameterize Data Source Connections or
Other Changing Data |
Home > Edit Queries > Manage
Parameters |
Parameters can be useful to minimize
hard-coding and reduce maintenance when changes need to occur. For data
source connections, by default the connection is stored in the first step of
every query. For dev>test>prod scenarios, changing the connection only
once in a parameter is far less tedious than for every query. It also reduces
the chance of error or omission. |
Set Data Source Privacy Level |
File > Options and Settings > Data
Source Settings > Edit Permissions > Privacy Level |
When importing data from various types of
sources, the types of optimizations available vary per data source (such as
query folding which passes filters down to an underlying data source). If
data is sensitive and you do not want it to be passed in-memory to another
server for consolidation or aggregations, and/or you want to ensure a DBA
monitoring the server cannot see values, you can control isolation by setting
the privacy level. This may result in a slower consolidation or comparison of
data, but is more secure for sensitive data. |
Set Encryption for Data Source Connections |
File > Options and Settings > Data
Source Settings > Edit Permissions > Encryption |
Connections should be encrypted whenever
possible as a standard security measure. This does need to be supported by
the data source which is being queried. |
Use Consistent Source File Locations |
Home > Edit Queries > Data Source
Settings |
When accessing sources such as flat files (ex:
CSV or Excel), make sure the files are accessible from a centralized location
which is not based on an individual user path. This will ensure a data
refresh operation won’t fail if another person executes it. |
Retrieve Source Data from a Database View |
Home > Edit Queries |
When possible, when retrieving data from a
relational database, select data from a database view rather than a table. Different
sets of views allow for different calculations and customization for different
purposes: DW views can feed a semantic layer for corporate BI delivery (ex:
Analysis Services). Alternatively, USR or RPT (user / report) views can be
customized for power users who are allowed to access relational data (ex: from
Power BI Desktop). |
Tables and Relationships:
To Do |
Where |
Why |
Use Friendly Business Name for Tables |
Data pane > Fields |
A table name like “Students” is nicer to
look at within a field list than something like “TblStdt.” The name should
represent the data contained within the table. Other contextual information
about the type of table is very helpful to include in the name, such as:
Financial Monthly Snapshot, Financial Transactions, or Financial Current
Balances. Though often debated, either singular or plural table names are
fine - just be consistent. |
Validate Relationships are Accurate |
Relationships pane |
Depending on the source of data, relationships may or may not be auto-generated. One of the most critical
tasks is to verify all required relationships are in place and accurate with
respect to the 1-to-many direction of each. |
Verify Relationships Do Not Change Direction |
Relationships pane |
The direction of a relationship affects how
filters propagate through a model. If the direction of relationships change,
a ‘dead-end’ is created which means reports do not display correct data. This
is more likely to occur for more normalized data models. Verifying this is a
critical piece of model validation. |
Validate and Use Bi-Directional
Relationships Purposefully |
Relationships pane |
Bidirectional relationships should be used
minimally, and with a specific purpose. They should be used as the exception
– to accomplish something specific – rather than the default, particularly if
you have any complexity to your data model. Sometimes CrossFilter() can be a
substitute for a bidirectional relationship. Verifying that any bidirectional
relationships are set up correctly is critical to model validation. (Tip: According
to the BiDi whitepaper, a relationship with a Date table should always be
one-to-many, never bidirectional.) |
Validate and Use Inactive Relationships
Purposefully |
Relationships pane |
Inactive relationships in the data model
should exist only when they have purposefully been set up that way, in which
case the relationship is invoked via a DAX calculation. Verifying that any inactive
relationships are set up correctly is critical to model validation. |
Create Synonyms to Improve Q&A
Experience |
Relationships pane > Modeling >
Synonyms |
If report consumers also use Q&A
functionality for natural language querying (ex: Sales by Year by Division), the
thoughtful creation of synonyms improves the user experience significantly so
that users do not always have to input field names exactly. For example,
perhaps the terms Division and Business Unit are interchangeable at your
organization. Or, perhaps there are common acronyms users might input into
Q&A that need to be translated to full field names. |
Assume Referential Integrity When Possible |
Home > Manage Relationships |
If you are utilizing data in DirectQuery
mode from a reliable and clean data source (such as a data warehouse), you
can improve performance by assuming referential integrity. |
Fields, Columns, and Measures:
To Do |
Where |
Why |
Create Unique Field Names Across the Entire Dataset |
Data pane |
Although the Power BI software permits columns
to exist which are named the same across tables, that is a poor practice to
allow in a data model. Let’s say that there is a Sales Rep table and a Sales
Manager table, and both have a Sales Region column. Both Sales Regions should
be renamed to be Sales Rep Region, and Sales Manager Region for clarity (or,
even better, renamed in an underlying database view where Power BI retrieves
the data). The general rule to ensure report labels are easy to understand:
the name assigned to each column needs to be self-explanatory on its own
rather than relying on the context of its table. |
Expose a Field Only Once in the Entire
Dataset |
Data pane |
Sometimes a column exists in the data model
more than once. The ideal choice is for the redundant column to not be
imported at all to the dataset. If that isn’t a viable choice for whatever
reason, then make sure the “extra” copy of the column is hidden from report
view. For instance, if a Sales Header ID exists in both the Header and the
Lines tables – make sure it’s hidden in the Lines table and only shown once
in the data model. This way, users won’t accidentally use both columns. Using
both could result in possibly needing to use two separate filters which would
provide a very poor user experience. |
Hide Fields Not Utilized for Reporting |
Data pane |
IDs and surrogate keys are needed for
relationships, but are not useful for reporting. Hiding them simplifies the
data model because there’s less fields shown in the field list. The
consumers of your data model will appreciate the lack of what they’d view as
clutter. (Reminder: if a column is not needed for something – a relationship,
basis for a calculation, sorting, something – then don’t import it at all.) |
Use Friendly Names for Fields |
Data pane |
A field such as “Student Name” is nicer to
look at for reporting than something like “Stdt_nm” which may be how it’s
stored in the source. Since field names impact the naming of column titles,
filters, and slicers, assigning friendly names with business-relevance are
well worth a bit of time investment. Just be consistent with respect to
spaces, casing, and abbreviations (such as Desc or Nbr). Try to only use
acronyms if you are certain all users know the meaning. |
Set Formatting for All Numeric and Date
Fields |
Modeling > Formatting |
It’s no fun to add a field onto a report
that needs to be reformatted every single time. Defining units as whole
numbers, or amounts as currency, is a helpful timesaver on the reporting end.
Don’t forget to also set the comma to assist with readability. |
Specify Sorting for Columns |
Modeling > Sort |
Creating a default sort order is a common
need for certain types of columns. For example, you may need to sort the
Month Name column by the Month Number column. Oftentimes the column serving
as the sort value can be hidden from report view. |
Set Default Summarization for All Numeric
Columns |
Modeling > Properties > Default
Summarization |
The summarization default is sum, but some numeric
columns are more suitable as count, min, or max. For example, high
temperature per day would never be summed for a meaningful value; average is
likely a better choice. Setting this properly allows subtotals and totals to
be presented properly. Summarization for column that aren’t really numeric, such
as Customer Number or Year Number, should be set to “don’t summarize.” More
details: Why the Default Summarization Property in Power BI is So
Important. |
Create Useful Calculated Columns |
Modeling > Calculations > New Column |
Creation of calculated columns (aka derived fields) is vital for enrichment of the data
model. A very simple example of this is names – perhaps the underlying data
source keeps First Name and Last Name in separate columns; you may wish to
derive a Full Name field for reporting purposes which concatenates them
together. Note: there are times when it's preferable to derive columns in the
Query Editor instead of DAX: (a) to take advantage of query folding, (b) to
remove intermediary columns from being loaded to the model, (c) to achieve
faster processing time since DAX columns compute sequentially, or (d) to
potentially achieve a better compression rate. Marco Russo discussed the last
two items in this post. For complex calculations in a larger model, test the
difference in performance. |
Create Useful Calculated Measures |
Modeling > Calculations > New Measure |
Calculated measures (aka explicit measures) are extremely useful to augment
reporting and analysis. Year-to-date and year-over-year are common examples
of calculated measures which rely upon current context for calculation at
runtime. Typically a model starts with common calculations, and continues to
evolve over time as business users identify additional relevant calculations. |
Decide on Location for Calculated Measures |
Modeling > Properties > Home Table |
Some Power BI practitioners prefer to locate
all calculated measures in a specific table just for measures. However, this
results in calculated columns and calculated measures residing in different tables
which might seem confusing to users. If you do choose to use a “home table”
for all measures, or certain critical measures, focus on organization to help
users find them easily. |
Consolidate Intermediary Calculated Columns |
N/A |
Frequently it is easier to understand and
test a complicated calculated column in a series of steps that build upon
each other. The tradeoff for this understandability is additional size and
memory usage. Therefore, you may consider consolidating calculated columns
once testing is complete rather than leaving the separate steps permanently in
the model. If they remain, make certain that any intermediary calculated
columns are hidden from report view. Alternatively, you could handle adding
new columns in the Query Editor (instead of DAX), and delete the intermediary
columns before the last step in the query. |
Decide on Use of Implicit Measures |
N/A |
An implicit measure utilizes the default
summarization setting such as “Sum of Sales Amount.” Some Power BI
practitioners prefer to use explicit calculated measures only, in which case
base numeric columns are hidden. Like many other things, consistency here is
what matters. |
Set the Data Category |
Modeling > Properties > Data Category |
Address-related columns, such as
city/state/zip and so forth, need to be specified in the data model to ensure
geocoding can occur properly. Typically, Power BI is very good about detecting
data categories from the data, but the data category for addresses, URLs and barcodes needs to be verified. |
Create Columns to Support Query String
Parameters |
Modeling > Calculations > New Column |
There are some rules for using URL query string parameters, such as no spaces allowed in the name.
Therefore, there may be circumstances when you need to create some specific
columns for use with query string parameters. In this case, you'll likely
want to name them something similar to 'pStateName' and ensure they are
hidden from report view. |
Other Dataset Items:
To Do |
Where |
Why |
Ensure a ‘Data Of Date’ Exists in the
Dataset |
Modeling > Calculations > New Column |
It is common to display a ‘Data As Of’ date
on reports. This should exist in the data model, based on the data itself and
not reliant on the current datetime. The objective is for report consumers to
understand the effective date of the data being displayed. As of dates tend
to be most significant for certain types of data such as sales quotas, or during
key periods such as month-end. |
Create Hierarchies for Usability |
Field List > New Hierarchy |
Date columns (such as
Year>Quarter>Month) are a common scenario for use of a hierarchy and
should be present in most models. Geography columns (such as
Country>State>City) are great candidates as well. After a column has
been added to a hierarchy, then it’s up to you whether the individual columns
should still be visible and available for reporting (sometimes users find that
seeing both the individual columns & the hierarchy columns to be
confusing). |
Set Up Row-Level Security |
Modeling > Security > Manage Roles |
Row-level security is useful to specify
which rows of data certain users are permitted to view. There are several
techniques for how to accomplish RLS. (Tip: if you find you are repeating work
in numerous PBIX files to set RLS, that means it may be time to look at using
Analysis Services instead.) |
A Few Common DAX Best Practices:
This is a very short list of good DAX habits – certainly not intended to be a complete reference.
To Do |
Where |
Why |
Use a Naming Convention When Referencing Measures
and Columns |
Modeling > Calculations |
When referencing a measure in a DAX
calculation, do not include the table name. Conversely, when referencing a
column in a DAX calculation, always include the table name as a prefix in the
format of 'table'[Column]. This commonly used best practice in the Power BI
community helps you identify what type of column is input to the calculation,
especially if you are troubleshooting context and calculation results. If you
need convincing, watch this 2017
PASS Summit video from Marco Russo. |
Apply the Most Selective Condition on Inner
Functions |
Modeling > Calculations |
If a DAX calculation is nested, the
innermost function is evaluated first. For performance optimization purposes,
the most selective condition should typically be nested to limit data as
early as possible. |
Use Error Messaging in DAX |
Modeling > Calculations |
The Error() function provides a description
if a calculation result cannot be rendered, which significantly can prevent
users being frustrated should something go wrong with the report display.
Error messaging is particularly important when displaying data which is
manually maintained. |
Consider Using the Divide() Function |
Modeling > Calculations |
If you utilize the Divide() function,
instead of Column A / Column B, you don’t have to worry about nulls or coding
around divide-by-zero errors. This standard practice prevents invalid numbers
being displayed on reports (which in turn minimizes users being tempted to
download to Excel just to ‘clean up’ the report output). In a typical data
model, use of Divide() will be fine. However, if you have complex
calculations and/or a complex data model, check out Alberto Ferrari's post about Divide() performance. |
Use Variables as a Common Practice |
Modeling > Calculations |
Use of variables in DAX is becoming a best
practice, as it can make the syntax shorter and easier to read. Variables can
improve performance because they are “lazy” meaning they evaluate only once
and won’t be evaluated if not used. |
Documentation Within a PBIX:
Last, but certainly not least!
To Do |
Where |
Why |
Solution Documentation |
Report > New Page |
You may consider using the first or last
report page as a place to include helpful tips, definitions, or an
introduction to the solution. It can also link to additional information, identify
who to contact with questions, and/or who the owner of the solution is. |
Descriptions for Fields |
Field List > Properties > Description |
A description can be immensely helpful for
users to understand the contents of a column or measure, when and how to use
it. When a description has been defined, it is shown in the field list as a
tooltip when the mouse hovers on the field name. (Tip: if you find you are
repeating work in lots of different PBIX files to input descriptions, that
means it may be time to look at using Analysis Services instead.) |
Comments in DAX Calculations |
Modeling > Calculations |
Comments in DAX (using // or / /) can be
very helpful to describe what’s happening in a calculation. This can be to
help others understand the calculation, or to leave yourself a reminder. |
Comments Within M Scripts |
Home > Edit Queries > View >
Advanced Editor --or-- Home > Edit Queries > Applied Steps >
Properties > Description |
Comments in the M script (using // or / /)
can be very helpful to describe the data transformations which are occurring.
Comments can be created one of two ways, as Chris Webb describes in his blog post. |
Description for Query |
Home > Edit Queries > All Properties
> Description |
If a query is doing something unusual, or is
serving as an intermediary query, it is very helpful to include a description
of the purpose for the query. |
Naming of Query Steps |
Home > Edit Queries > Query Settings
> Applied Steps |
Steps in the query editor get auto-named,
which is sufficient for most needs. If you are doing a lot of data
transformations, or something complex, it can be helpful to rename the query
steps to do what’s happening at a particular stage. |