To build a Data Warehouse or NOT!!

Tim Seyfried, Principal Consultant, Data Analytics Practice

May 2019

As I start to build this Data Analytics Practice at Cintra, I get asked if making a traditional data warehouse is the right thing to do. My knee jerk reaction is yes, but that is because I have developed several very successful data warehouses as a foundation for advanced analytics practices at consumer products companies. If asked a more specific question should we be building an enterprise data warehouses today my answer would be no. Think enterprise analytics practice instead. Every analytics project I have been on that aspires to build an enterprise data warehouse has been a disappointment, and will always fall short of the goal.

The standard definition of a data warehouse:

“In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise.”

This definition, to me, defines the pipe dream of an enterprise data warehouse. Let’s take a step back and let me clarify what I think a data warehouse is.

A data warehouse (DW) is a system used for reporting and data analysis. Generally, it should have a single primary transactional source that is considered the system of record. It is a curated data repository that is the authority of truth for reporting the transactional level of data for its source within an organization.

Does this mean that you need to build a separate data repository as we have for many years? No, it makes sense that a DW system doesn’t limit itself in such a way but be adaptive to the capabilities of the architecture it finds itself. In many cases, it still makes sense to build a separate hardware and software infrastructure so as not to limit the data warehouse to the architecture put in place to support transactional processing.

Let’s define the features of what a successful data warehouse system should look like:

  • It should be the only means for which reporting for that transactional system. When other options to get data exists, there will always be an alternative version of the truth. Not that alternative versions aren’t acceptable but if they start from the curated version of the fact stored in a DW system then at least they can be pinned back to that version.

  • If a DW system is the only reporting allowed, then it needs to have a near real-time capability and a snapshot capability to be successful. I have built data warehouses that provided pick list to forklift operators under the same constructs as yearly sales analysis for the c suite and daily position analysis.

  • It should be the central source for feeding none auditable data to other systems. For example, it should be the source for supplying sales and inventory data to the forecast and planning systems, and it should be a source of data into a broader analytical architecture.

  • If at all possible compliance and data quality processes should be done outside of the actual data warehouse system. For example, if an inventory report is used as part of an auditable business process then that process should have steps embedded that ensure the accuracy of the data back to the system of record. Applying audit level controls to the data warehouse system will drive analytics that does not require this level of scrutiny into business-driven shadow IT initiatives and will ultimately make the data warehouse system less efficient and adaptable to business needs.

  • By making the data warehouse tight with a transactional system makes it easier to find off the shelf solutions tied to that particular system, especially if the goal isn’t to build an enterprise-level data warehouse.

  • The data warehouse or better said a set of data warehouses should be the foundation of a larger advanced analytics platform.