Tim Seyfried, Principal Consultant, Data Analytics Practice
The leap from traditional batched loaded DW, and NRT DW is smaller than the jump from transactional reporting to the data warehouse.
Conceptually the structure you want at the end is the same as a traditional DW.
This can be broken down into three discreet steps.
- Transactional to STG/ODS
- STG/ODS to physical reporting
- Logical layer between physical and end user
Looking specifically at the first step:
The goal here is to limit the footprint on the transitional system. There is no good way to limit the refresh activity other than to make it as fast as possible. So, no joins, lookups, or transformations. These only slow down the extract process, and since most tools drive down the conversion to the source, it increases your effect on the source system.
Capture must run on the source system. In very old implementations of CDC using CDC streams the apply, capture and therefore CDC transactional tables are also in the source. if possible, logs should be transported to the target and the apply process run on the target system.
Exceptions to transformation should be limited. The one exception that I have allowed for is the conversion of dates. Transactional systems often use a Julian date of their design since this is a calculated transformation that can be done via a set formula it doesn’t significantly slow down the extract so it can be an exception to the rule. You should never do lookups.
Stage all your tables for one transactional system in one place regardless of purpose this will prevent tables from being staged multiple times for multiple purposes.
Resist the urge to filter either rows or columns out and pull all the data available for a transactional table. These rules will significantly reduce rework going forward.
General guidelines for ODS structures.
- Tables should match the source system in structure (field length and types).
- Table Names should be given transactional and business names.
Transactional Table: F0005
ODS/Stage Table F0005_USERDEFINEDCODES
- Column names as much as is possible should be given transactional and business names.
Transactional Column Name: DRSY
ODS/Stage Column Name: DRSY_INSTALLSYSTEMCODE
- No foreign key constraints or indexes from the source should be carried into the ods/stage. Carrying these forward prevents parallelization of change data apply.
We will discuss the other two stages in future posts.