What's the Purpose of a Datawarehouse?
I've been meaning to write this since I first began my foray into the world of datawarehousing back in December of 2006.
What is the purpose of a Datawarehouse?
- Reporting
- Key Performance Indicators (KPI)
- Performance - Data is stored in such a way (facts and dimensions) which enables higher performance against the same relational tables in an OLTP system.
- Slicing and Dicing - Viewing the data from multiple angles (dimensions) is the one of the best aspects of the Datawarehouse. Also included is the ability to drill down into the finer grained details.
- Single Source of Truth - Especially important in an enterprise environment which may have many systems of record. The datawarehouse allows you to aggregate all those systems into one environment. Not always completed, it's more of an ideal than anything.
- Clean Source Data (Transformation) - The datawarehouse allows you to "fix" the sub-par OLTP systems. Think garbage in, garbage out.
From my point of view as a database developer, #3 is the most important. I believe that the "Transformation" stage of ETL should be easy if only the source system had been designed properly. I've seen instances where a column (VARCHAR2(30) for example) meant one thing at one point in time and another at a different point in time. Or a data was stored in the column as MM/DD/YYYY, DD-MON-YY or Month DD, YYYY. How do you clean that up?
I also recognize that in an enterprise environment you can have the best designed OLTP systems but you still have to map the data to one field.
Ultimately, I think #1 should drive the design of all OLTP systems, making concessions where needed for performance. From there, your datawarehouse should flow fairly naturally. The ability to report on the data that you gather allows the business to make decisions for the future. In other words, it's the data stupid.
Labels: datawarehouse