Modelling the Data Warehouse

This issue – we discuss types of models, their characteristics, nature, and application.

Welcome to Data Warehouse Corner. Each issue we’ll identify a business problem facing the Data Warehouse / BI data modeller and discuss solutions and implications for the ERwin modeller. Let’s start by discussing the differences between traditional data models and those developed for the data warehouse and multi-dimensional data marts.

We get into some great debates in the Sandhill course – Dimensional Modeling Fundamentals, and while we come away with a sound understanding of the differences between ‘traditional’ (let’s call them OLTP) models, data warehouse models, and dimensional models, each customer has a specific twist on their depiction and characteristic of the models in their shop. This can arise because of the perception that what is seen in the data warehouse or data mart, is what was borne out of the modeling effort that produced the eventual data stores – but this be based upon a contradiction in fundamentals.

The data model is first and most importantly, a communication enabler (my sincerest apologies to the folks who think engineering ability is the most important capability of the data model). The data model’s value must be measured by its ability to define the scope, the data definition, and solution (in data terms) to a business requirement, and reflect the information engineer’s interpretation of the solution.

 With modeling OLTP systems, we start by having the user generate broad requirements (External Design) , narrow down the scope to a refined set of business requirements (Internal Design), and produce a set of data structures that represents our solution – so in effect we start with the requirements and end with the data that represents our solution design.

With modeling BI/DW systems, we start with a broad base of data (Source systems), identify the business questions to be addressed, and identify the roles that the data plays as part of the solution. The end result is a set of requirements, that are iteratively implemented, and the question changes as soon as the requirements are met, leading to more requirements. So we start with the data and end up with the requirements!

At the end of the day OLTP and DW/BI models that we produce may appear to be the same, and may communicate to a similar target audience, but because they are built with a different approach and guiding set of principles, they can often look very different in nature and structure. Let’s examine the guiding set of principles a little more closely.

Principles of the models:

– Partitioned along business or development lines – Data from various business lines is grouped into ‘Subjects’ – Designed to easily map on to OLAP tools and spreadsheets
– High availability in a closed environment – Logical model closely represents enterprise model deliverables – Relationship issues minimized due to conformed keys
– Predictable access to a small unit of data at one time – Integrity is assumed – Identify a single aspect and how it will be analyzed
– Relationship variables – dependency, optionality, cardinality – Special attention paid to history and time structures – Consumption pattern can alter data’s appearance
– Domain, consistency and referential integrity is of primary concern – More generic / conformed to present all necessary business events – Relationships only identify navigation
– Various methodologies / degree of normalization (hovers around 3NF) – Normalization is based upon different fundamentals (next issue!) – Normalization is really not applicable

From the principles we can conclude:

OLTP models provide both graphical and lexical documentation of the business area under examination, and allow us to follow principles of object reuse (inclusion) by making structures as applicable across enterprise. Within OLTP systems we can analyze data structures in isolation of business processes. Performance optimization, information scalability, and data security can be achieved, and ultimately, the most important requirement, data integrity, can be maintained (or heaven forbid – explained). OLTP models also form the base structure to evolve an enterprise data foundation/architecture and provide support for model-driven development.

Data warehouse models provides the ability for the users and technical staff to match requirements to the base of existing data, and achieve the best source(s) of data through consolidation and integration of clean data, without the encumbrance of systems security. Data warehouse modeling offers the ability to identify the many, if not 1, “version of the truth”.

Dimensional models provide ‘a picture’ of the data that answers a specific, or narrowly defined, and related, set of business questions. Each data element plays a specific role in the analysis (Measure, Dimension, etc…), and a data element can play a different role in answering different questions (e.g. Customer can be a Measure or a Dimension – it depends on the question). So in the end we have different models to be produced based upon unique and specific fundamentals.

Leave a Reply

Your email address will not be published. Required fields are marked *