Data Source Connectivity for Scheduled Data Refresh in Power BI
***Note this post applies to Power BI for Office 365 (deprecated as of 12/31/2015). Though some concepts remain the same or similar, all details are not necessarily applicable to the new Power BI V2 service.***
As discussed in my earlier blog entry about when a gateway is required, we know that if you need to schedule a daily or weekly refresh for a Power BI workbook that's up in a Power BI site, and it points to an on-premises database, there needs to be a gateway and data source defined in the Power BI Admin Center. This connectivity is usually defined by the administrator of your Power BI Site in Office 365.
Think about this though...the Power BI Admin Center defines the gateway (where the server is at) as well as the data source (database) that's being accessed....does that mean the connection string and credentials used in the Power BI Admin Center "trump" what's been set up in the workbook? You betcha it does! I don't know how the Power BI team refers to it**, but the way I think about it is there is a "matching process" for the workbook to locate which data source in the Power BI Admin Center to use.
**Update: Since writing this blog entry I learned that the Power BI team calls this a Discovery Process.
The Matching Process
The thing I'm trying to articulate with this blog entry is this...how does Excel Workbook D know to use Data Source 2 for its data refresh operations in the Power BI Site?
Based on what I've tested so far, the following needs to match between what's defined in the workbook and what's defined in the Power BI Admin Center:
1. Server name matches between workbook and a data source in the Power BI Admin Ctr
2. Database name matches between workbook and a data source in the Power BI Admin Ctr
3. Same provider is used in the workbook and a data source in the Power BI Admin Ctr
4. User who is defining the data refresh has permission to the data source in Power BI
This has a couple of very specific implications, particularly #4 if security is involved for the data coming out of the source data. But we'll get to that. First, let's back up a moment and talk connections, shall we?
Connections and Credentials in Power Pivot
In the Power Pivot workbook, a data connection to the source system being used is defined. The person creating the workbook usually uses their own credentials in this situation. They manually refresh the data while they're developing and validating the Power Pivot model (i.e., while working in Excel before it's finished & uploaded to the Power BI Site).
Most users will assume that what's been defined in the workbook will continue to be used once the refresh is scheduled in Power BI - but that's not necessarily true unless the administrator intentionally uses the same connection string from the workbook. The problem with that theory though is that the administrator won't be able to use the user's individual credentials. So the administrator would usually set up a specific service or application account to run the refresh - this is fine if all the data is open. Problem being that this account used in the Power BI Admin Center may or may not have the same permissions to data in the source system as the individual user.
If the source database is in Azure, then this isn't a concern. The refresh process will continue to use the credentials stored in the workbook. However, if the source database is on-premises, thus requiring a Gateway and Data Source in the Power BI Admin Center, then this "matching process" comes into play. Let's explore each of the 4 criteria...
1. Server Name Matches Between Workbook and Power BI
This one is pretty straightforward - same server is referenced inside the workbook connection and the data source in the Power BI Admin Center.
2. Database Name Matches Between Workbook and Power BI
This second one is also pretty obvious - same database (aka Initial Catalog) is referenced inside the workbook connection and the data source in the Power BI Admin Center.
3. Same Provider is Used in the Workbook and in Power BI
How the connection in the workbook is handled by default depends on the individual computer. A Power Pivot connection will default to using the Native Client provider if (a) SQL Server is installed on the computer, (b) SQL Server tools are installed on the computer, or (c) the user has downloaded sqlncli.msi from the SQL Server Feature Pack. When using the native client provider, 11.0 = SQL Server 2012, and 10.0 = SQL Server 2008 and R2. (There is no native client provider for SQL Server 2014.) Here's what it looks like in Excel of the native provider is used:
If SQL Server tools or the provider has *not* been installed on the user's computer, then the Power Pivot data connection will default to OLE DB (although it can be changed by the user).
A data source being created in the Power BI Admin Center defaults to OLE DB (although it can also be changed) which is a different default than inside of Power Pivot. If most users of the Power BI system don't have SQL Server installed (which would be the norm), and you expect them to choose the simplest/default connection types in Power Pivot, then using OLE DB here in the Admin Center may be the least troublesome.
However, if a user has uploaded a workbook with a native client connection (or .NET), but the data source in the PBI Admin Center is set up using OLE DB, the refresh operation will fail.
****Thanks to Gerhard Brueckl who alerted me to this issue with the mismatch of providers.****
So, depending on your environment, this is a great opportunity to give users some documentation or FAQ for what type of connections you want them to use in Power Pivot.
4. User Who is Defining the Data Refresh Has Permission to the Data Source in Power BI
This last item is related to the "users and groups" who have been defined in the PBI Admin Center as having access to the data source. Permissions to each data source can be critical, depending on security of the data coming out of the data source - there's more to the story here, but I'll save that for another post.
If a user tries to set up a data refresh and the "matching process" finds a data source that meets criteria 1-3 but not criteria 4, you'll get an error with a message like this: "We are unable to refresh the on-premises data connection. Reason: You are not authorized to access the data source. Please contact your Power BI Administrator to get access to the data source."
Credentials Used for Power BI Data Refresh
One last thing I want to make sure is emphasized: the credentials set up in the Power BI Admin center data source are what's used for scheduled data refresh & on-demand data refresh in the Power BI Site. I did a test where the workbook that was uploaded had user credentials; however, the "matching" data source in the Power BI Admin Center used a service account for its credentials. Ran a Profiler on the local database to verify that the credentials defined in the Power BI Admin Center (the service account) are what was being used:
So, in summary, keep in mind that what's in the Power BI Admin Center will "trump" what's in the workbook for data refresh from on-premise data sources (whereas Azure will continue to use the connection defined in the workbook). Watch out for this change in credentials if you have limited security to data going in in your environment. You want to avoid the situation where User Emily uploads a workbook with just Division A data, for example, and then the scheduled refresh runs with an account that has broad permissions & then suddenly User Emily sees everything next time she opens the workbook. This can be managed via permissions, it just requires knowledge of what is going on in the environment.
Finding More Information
Power BI Support Blog - Connectivity 101 <--Good tip here on testing connectivity
Office - Schedule data refresh for workbooks in Power BI for Office 365