In this article we’ll discuss Slowly Changing Dimensions and their impact on BI applications.
This author first saw the impact of slowly changing dimensions when the team was building data marts to track insurance sales for each month. The sales organization structure would be re-built from the source system (we had a single tier DW system which meant that the production system data was directly input to the data mart) to align with the sales for that month. Within a given month we had integrity within the snapshot data, but if we looked back into the historical sales data using the existing organization structure, we were faced with problems. When we attempted to analyse current sales data using the historical organization hierarchy, we ended up with orphan salespeople (no reporting office), or sales offices that had a revenue position when they weren’t even in existence! As if we didn’t have enough data quality problems, we created some of our own!!!
When we first examined the problem we discovered that Ralph Kimball’s classic interpretation of this problem, that is being caused by a Slowly Changing Dimension, was correct. In our case it wasn’t ‘slow’ (referring to the inconsistent timing of the change), but rather episodic, in that they would change many times within a short time frame, or not change at all for many months. The ETL team was victimized each month as they didn’t know when the change was to happen, to better deal with it.
To address the problem we identified the changing nature of the data in specific ways, in order to better deal with changes over time. The characterization was done as follows (a sample instance in parenthesis) and can be characterized as follows:
Type 1 (birth date) – no temporal association. This easiest of structures is where the current data represents/is the value that has been associated with the tuple since its inception. (Little causal analysis can be constructed, rather only an evaluation of the current or most recent business cycle is accurate.)
Type 2 (product price) – the value at each instance and a temporal association representing the date that the value has changed (for analysis purposes – not necessarily the actual date of the change. This is the most popular of characterizations but can be a performance challenge.
Type 3 (smoker indicator) – only the current value and previous value is of interest. Any additional prior values are of little interest for analysis purposes. Can have some causal analysis value when trying to understand the reasons behind a change in value/state of a measure.
To help the analyst, Erwin has the capability to display this type of characterization on the Physical Side of the model when the Dimensional notation is chosen (Model…Model Properties..General and Notation tabs).
This is helpful to the data analyst and application builder since Data Browser reports can be produced to support the characterization and data mapping to be performed behind the process of loading the data.
This is also helpful to the DBA since it can also indicate where additional database objects can lead to improved performance, such as VIEWS, MQT’s or Indexes.
In this article we’ve extended the data model by illustrating how data changes over time. This is valuable to the data analyst as it defines analysis potential, and for the data warehouse DBA as it gives a better understanding of data growth and where performance issues may arise.