Welcome to Part 2 of a 3-part series on using the new Entity-Based Staging Process in Master Data Services (MDS) 2012.
Part 1: Overview of the Staging Process
Part 2: Loading a Model Where All Attributes Come From a Source System
Part 3: Loading a Model Where Some Attributes Are Maintained Directly in MDS
Sample Model
First, a few quick words about the sample model we’re going to use. It’s a simple Account Model which has 3 entities. The Account Type and Account Class entities exist solely to populate the primary Account entity.
Conceptually the Account Model looks like this:
The Account Entity in MDS looks like this:
The remainder of this blog entry discusses loading the Account model at the leaf (most detailed) level. All attributes, for all entities, come from a source system.
Leaf Member Staging Tables
To support the Account model, 3 staging tables were created by MDS when the entities and attributes were set up:
Recall that the Account Class and Account Type are domain-based attributes, used as follows:
- Account Type: Contains a distinct list of Account Type code and name – used as the lookup table for the main Account entity.
- Account Class: Contains a distinct list of Account Class code and name – used as the lookup table for the main Account entity.
- Account: Contains the accounts, plus the mapped values of Account Type and Account Class.
For purposes of this example, we are going to assume all of the attributes in these entities come from a source system external to MDS. (Note: Part 3 of this series looks at how to handle it if one or more attributes are maintained directly in MDS – i.e., the data doesn’t exist in a source system anywhere.)
SSIS Package
Following is an example of the flow for the SQL Server Integration Services (SSIS) package. Note that in a Production ETL environment a few things will be more dynamic, but the purpose of this example is to be simple & straightforward.
Step 1: Truncates each of the 3 staging tables. This is to eliminate the data from the last time this process ran.
Step 2: Data flow to populate each entity in the model.
In your source query, make sure you pull back the code field for the domain-based attributes rather than the name field. For the Account entity, we’ll only populate Account Type and Account Class fields with the code value. However, if we were populating the Account Class or Account Type entity, we would populate both code and name.
The derived columns include:
- ImportType of 2 (which means updates can be accepted vs. a failure). There are 6 codes to choose from to control the behavior. The list of ImportType codes can be found here: http://msdn.microsoft.com/en-us/library/ee633854.aspx
- ImportStatus_id of 0 (which means each record is ready).
- BatchTag which I chose to create from an ETLExecution_id (i.e., a unique code used for ETL job management) plus a string which refers to this entity. The reason the string is appended is so that each BatchTag is unique for every data flow. MDS will generate an error if > 1 of the same BatchTag executes at the same time. Therefore, each of the 3 data flows in my example append a different string so they are allowed to run in parallel for speed.
The data conversions are only needed because my source data was non-Unicode, and MDS is Unicode.
Mapping of fields into the Leaf Member Staging Table looks like this:
The Code can be ignored if you specified for the Code to be created automatically when the entity was set up (a new feature in MDS 2012).
More information about populating the Leaf Member Staging Table can be found here: http://msdn.microsoft.com/en-us/library/ee633854.aspx
Step 3: Load the MDS Model. With this step, we’re going to be invoking the MDS-provided stored procedure that takes the data out of staging and moves it into the actual model.
Each staging table has its own stored procedure. The stored procedure name will be in the format of udp_EntityName_Leaf (unless a different name was chosen when the entity was created).
Syntax is as follows:
DECLARE @RC int
DECLARE @VersionName nvarchar(50)
DECLARE @LogFlag int
DECLARE @BatchTag nvarchar(50)
SET @VersionName = N'VERSION_1'
SET @LogFlag = 1
SET @BatchTag = (? + ' Account')
EXECUTE @RC = [stg].[udp_Account_Leaf]
@VersionName
,@LogFlag
,@BatchTag
GO
Note that the BatchTag needs to match what was specified in the Step 2 data flow. I’ve set mine to be unique per entity so the batches can be executed in parallel. The ? in the screen shot above signifies I’m using a parameter (which is mapped to a variable which contains the ETLExecution_id); this helps the BI team managing nightly loads know which run this was associated to.
A LogFlag of 1 specifies logging is enabled. The VersionName also needs to be specified – a slight complication if you do a lot of versioning in your environment.
Execution of this step to load the model is what correlates to the Integration Management page in MDS:
More information about loading the MDS Model can be found here: http://msdn.microsoft.com/en-us/library/hh231028.aspx
Step 4: Validation of the Model. Until validation is performed, the new inserts and updates are there but in a “waiting validation” status. If any Business Rules have been defined for the model, they will be applied when Validation is run.
The Validation applies to the model as a whole, so it only needs to be done once regardless of how many entities we just loaded.
Syntax is as follows:
DECLARE @ModelName nVarchar(50) = 'Account Model'
DECLARE @Model_id int
DECLARE @UserName nvarchar(50)=?
DECLARE @User_ID int
DECLARE @Version_ID int
SET @User_ID = (SELECT ID
FROMmdm.tblUser u
WHERE u.UserName = @UserName)
SET @Model_ID = (SELECT Model_ID
FROM mdm.viw_SYSTEM_SCHEMA_VERSION
WHERE Model_Name = @ModelName)
SET @Version_ID = (SELECT MAX(ID)
FROM mdm.viw_SYSTEM_SCHEMA_VERSION
WHERE Model_ID = @Model_ID)
EXECUTE mdm.udpValidateModel @User_ID, @Model_ID, @Version_ID, 1
More information about validating the Model can be found here: http://msdn.microsoft.com/en-us/library/hh231023.aspx
The @UserName variable in the above validation statement is mapped to a parameter which references the UserName of the system variable. As long as all persons or service accounts have access to the model, that should work.
The 5th and final step I have in my package is to check if any errors occurred and then alert the BI Team and/or Data Steward if needed. This is done by querying the views provided by MDS for each entity.
In my process, I just do a simple query that checks if 1 or more error records exist and if so, send an email. Note the ResultSet is set to Single row. The total count is passed to a variable. If the value of the variable is >=1, the next step of generating an email will be kicked off.
The syntax is as follows:
;WITH CTE_CheckEachTable AS
(
SELECT Count_MDSErrors = COUNT(*)
FROM stg.viw_Account_MemberErrorDetails
UNION ALL
SELECT Count_MDSErrors = COUNT(*)
FROM stg.viw_Account_Class_MemberErrorDetails
UNION ALL
SELECT Count_MDSErrors = COUNT(*)
FROM stg.viw_Account_Type_MemberErrorDetails
)
SELECT Count_MDSErrors = SUM(Count_MDSErrors)
FROM CTE_CheckEachTable
More information on viewing errors can be found here: http://msdn.microsoft.com/en-us/library/ff486990.aspx
Out Of Scope
The above process excludes the following:
- Changing the code for an existing record (i.e., usage of the New Code field in the staging table).
- Deactivating or deleting an existing record (i.e., via usage of specific ImportType codes).