In this article we’ll discuss the updated Data Movement features in Erwin R8 that identify data sources, data targets, and transformation logic.
You might be wondering why you’d use a data modeling tool to map data movement and refresh rules, to be used for an ETL application, especially when the ETL tools do such a good job on their own. I have worked with data warehouse teams in which users occasionally look at field totals in reports that don’t add-up given the data sources (either OLTP or Warehouse). So the ETL programmer has to then explain the path that the data took to get into the report, and this can be a huge investment of time that should be spent on development activities.
Additionally, users sometimes need to have ready access to metadata that describes the fields in both technical and business terms, and the ETL tool is not the best place to store this information for the reasons just mentioned (the programmer is the only one with the tools).
Enter Erwin and its strengths in design and meta data capture, that will assist in the design of the data flow and integrate with the ETL tool (through the tool-bridge) to leverage the investment in the analysis effort.
Let’s first look at the data flow design activity – in subsequent issues we’ll show you how to report on the meta data and extend the definitions to other tools.
In Erwin, Data Movement is enabled within the Model…Model Properties dialog. The ability to define rules and mappings is available only within the Physical model, since the data mapping process is geared towards actual data that exists within files and databases, not the logical definitions of the data that might appear in the E-R model.
At right is the Model…Model Properties…General tab and the appropriate setting to enable the Model Explorer options…
As of Erwin R8, the UI has been improved to help the modeler identify, at the Table and Column level, Data Movement Rules and Data Movement Sources.
Data Movement Rules allow the designer to define the process for data movement, to choose the tables to be involved in the mapping process, and to specify if the process is fundamentally one of data Refresh, Append, Backup, Recovery, Archive, or data Purge.
Since the model being created represents the target of the data mapping or ETL application, Data Movement Sources allows the designer to define flat file or relational sources that will be mapped at the Table and Column level, along with Comments that can serve as Pseudo Code or Structured English, to be used by the developer within their ETL tool, or to serve as meta data to any casual observer.
Here we have created a Data Movement Rule for the Fact Table named ENTRY. It will be refreshed each week as noted in the Definition tab. The Rule also applies to the other tables in this dimensional model, including all of the Dimension Tables, since they are noted in the Attached Tables window in the lower right (they were moved into the right window from the left or Unattached Tables window). The Where Used, a new feature in R8, is beneficial as it can help track down other uses of the Dimension tables in the model to be sure that the update cycle is consistent. In a later issue I’ll show you how to write a SQL query against the Erwin Meta Model that identifies tables that are not part of any ETL mapping design, so we can track them down, and define their population process, which will improve the overall quality of the design.
Here we have defined a Data Movement Source for the Fact table, and named it DDL_SRC_CDW. In this example the target model is defined as an Oracle 10G Data Mart, but the mapping source is SQL SERVER 2008, since our Central Data Warehouse (CDW) is built with SQL SERVER. Note that the meta data defined in this tab (the fields with a white background anyway) is for the most part arbitrary, but that’s not to say that it shouldn’t be factually correct.