I know that a good percentage of you know the difference between Data and Metadata, but for the benefit of those that do not know, and to help us better understand the difference between Data Lineage and Metadata Lineage we need to get a common understanding of Data and Metadata.
Data is the collection of facts from which conclusions may be drawn. Data is the facts that are stored in a data store and makes the record unique from other records (please don’t get confused with a unique identifier; this is merely a mechanism to help a database to be able to separate different records).
From the example you see the form and all its labels as the metadata and the filled in details (handwritten) as the data, then when we compare this to a record in our data store one can quickly start drawing the comparisons between the two. Taking the example, in our filing cabinet we would then have multiple copies of this receipt filled in for different clients and different sales, these being different records in our data store.
Now with all that in our heads, we can start moving onto how these concepts of lineage, data and metadata come together. With our Data Lineage, it is an understanding of where our data comes from (the line portion) and the way it has been transformed to get to its present state (the age part of the equation). Let’s take an example of a data capturing process, an operator captures some data on an application screen, maybe a call centre agent capturing a new client’s set of information. With the understanding that the screen has specific fields to complete and certain checks for the data (i.e. check the order date time field is a valid date and time).
Further this data might need to be passed via a middleware application to another system, for our simple example, a data warehouse. During the middleware’s process the data might need to be further transformed, as in our example the order date time field might now need to be converted into a common date time field potentially based on GMT, as it is going to be combined with orders that are captured all over the world. This would be another transformation of the data. From here it would be represented on a data warehouse report, being transformed along the way to the current state that the consumer of the report sees.
If we look at our enterprise data modelling environment, we should be looking at a ‘Design Layer Architecture’ approach. What this means is that we should divide our Enterprise Model into several distinct and linked layers, each with their own purpose and own value to the business. As can be seen in Figure 4, we should have a Conceptual Enterprise Model that is linked to an Application Logical Model, which in turn is linked to an Application Physical Model, and in turn to the physical databases.Figure 4 – Data layer Architecture Example
So how does this relate to Metadata Lineage? As stated earlier the metadata starts at the Conceptual Enterprise Model where we will define that we need to track orders in our business, it will also define that our business order needs to have an order date, it will not define that this order date needs to conform to any location specific details. We then move to the Application Logical Model, at this level we might, as per the example of the data lineage, want to define that the capturing application’s metadata for the order date and time field needs to comply to being a valid date and time. But also at the Application Logical Layer, our data warehouse wants it to be not only a valid date and time field but also to be converted to a common location (e.g. GMT), hence the metadata is not the same for the two systems. In the example the definition (or metadata) for the Conceptual Enterprise Model would be something like ‘The date and time the order was placed’, in the definition for the capturing system’s Application Logical Model it would be something like ‘The valid date and time the order was placed’. From this you will see that the metadata has been transformed by adding the requirement of the field being a valid date and time field, it is still linked to the Conceptual Enterprise Model’s metadata as it is still the same data, only with added (transformed) level of information. Now let’s look at the data warehouse system in the example, the definition would be something like ‘The GMT date and time the order was placed’. From this we can see that it is different to that definition of the capturing system’s metadata, but it is still derived from the same metadata in the Conceptual Enterprise Model and not from the metadata from the capturing system’s Application Logical Model.
So, in wrapping up there is a distinct difference between Data Lineage and Metadata Lineage. I will go into more detail in future articles showing how ERwin deals with these two and how you can grow a consistent, linked and successive Enterprise Data Model using these functionalities in ERwin to benefit your organisation.