I am a big proponent of the Kimball method of data warehousing. The denormalized table structure, with Type 1/2/3 dimensions, and surrogate keys provide a lot of benefits for querying and reporting. However, there is one situation that I seem to run into every once in a while, at which time I have that “Oh yeah” moment.
The situation: Records for a student have changed over time within a Type 2 dimension table. We have two goals:
Goal (a): Retrieve all historical fact records (regardless of how the surrogate key value may or may not have changed at any point in time).
Goal (b): Retrieve the current dimension data in the report.
First, let’s review the dimensional data for Danny Zuko. (And before we go any farther, let’s just agree that Grease is one of the finest movies ever made, shall we?)
Sample Dimension Records:
In the above table, the Student ID is our surrogate key in the data warehouse.
Note that Student ID #3 is the current record, showing Danny’s current address and current major, both of which have changed over time.
Next let’s look at some fact records.
Sample Fact Records:
Let’s say we have a requirement to produce a report of attendance history by individual student. If we based the query on the current StudentID (3), we’d inadvertently drop fact table records associated with Student IDs 1 and 2. Following is a solution we routinely put in place in our reporting queries to ensure all history is captured.
Sample query:
SELECT
DSt.StudentName
,DSt.Major
,DSt.AddressFull
,FSA.FromDateID
,FSA.ToDateID
,DaysAttended
,DaysAbsent
FROM DW.DimStudent DSt
INNERJOIN DW.DimStudent DStAll
ON DSt.NaturalKey = DStAll.NaturalKey
INNERJOIN DW.FactStudentAttendance FSA
ON DStAll.StudentID = FSA.StudentID
WHERE DSt.CurrentRecord ='Yes'
How our two goals are satisfied with this query:
Goal (a): Retrieve all historical fact records (regardless of how the surrogate key value may or may not have changed at any point in time). The above query joins DimStudent back to itself a second time, using the natural key. The join to the fact table records are based on the 2nd DimStudent table (i.e., the DStAll alias), so we are assured of retrieving all fact records.
Goal (b): Retrieve the current dimension data in the report. The WHERE statement restrictions are based on the first DimStudent table (i.e., the DSt alias).
This approach assumes that the natural key is retained within the dimension, but not within the fact. It also assumes that the natural key is reliable data.