14 March 2008

Data Warehousing 101: Pitfalls to Avoid

Sometimes technology gets in the way of getting things done, and us tekkies must do what we can to make our data warehouse work for the purpose it is intended. Ralph Kimball makes 10 good points about the common pitfalls to avoid when implementing a data warehouse, as listed in his book "The Data Warehouse Toolkit", published by Wiley.

  • Become immersed with technology rather than the requirements and the goals of business;
  • Fail to recruit an influential and accessible management visionary to sponsor the data warehouse;
  • Turn this into a multi-year project instead of pursuing manageable iterative development;
  • Run out of budget while creating a normalized data structure, before building a viable presentation area;
  • Pay more attention to back-room ease of development over front-room ease of use and query performance;
  • Make queryable data in presentation area too complex, thus causing users to refer to developers;
  • Populate dimensional models on a standalone basis without regard to the data architecture that ties them together;
  • Load only summarized data into presentation area's dimensional tables;
  • Presume that the business, its requirements and analytics and the underlying data and technology are static;
  • Neglect to acknowled that data warehouse success is tied directly to user acceptance.

Next: using MySQL and php to extract, transform and load data (ETL) into the warehouse

Until Next Time,

07 March 2008

Data Warehousing 101: Requirements for a Data Warehouse

Yes, it has been a while since I added an entry in my blog. I have been working on creating a dynamic data warehouse system reliant on the traditional LAMP stack (and a very nifty graphical plug-in - please comment below if you wish to know what it is!).

Firstly, I must make these qualifications before you read further:
  1. I consider Bill Inmon and Ralph Kimball the pioneers of data warehousing;
  2. Data warehousing has been a relatively recent concept, since 1983 when Teradata introduced a database decision support system, and most current day concepts we understand about data warehousing have been taking shape over the past 10 years;
  3. I am new to this as well, but am very satisfied with the results I am obtaining.
OK... What does a data warehouse look like? This is a succinct high-level view as aptly described on the data warehouse entry on wikipedia:
  • Subject-oriented - data is organized so that all data elements relating to the same real-world event or object are linked together.
  • Time-variant -changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time.
  • Non-volatile - data in the database is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting.
  • Integrated - database contains data from most or all of an organization's operational systems and this data is made consistent.
This premise draws us to what appears to be a centralized solution for a data warehouse. However, you can approach it on a smaller scale by identifying the basic bones of your database structure, thus enabling you to produce results in a shorter span of time. Note: the building blocks of data is not done according to departments - for example, sales, marketing, customer service, and so on.

Instead, you can establish what your basis for your data is through fact tables and dimension tables.

A fact table is the "primary table where the numerical performance measurements of the business are stored" (The Data Warehouse Toolkit, second edition, p.16). The most useful facts of this table are numeric and additive.

Daily Sales Fact Table

Date Key (FK)
Product Key (FK)
Store Key (FK)
Quantity Sold
Dollar Sales Amount
A dimension table, the companion to the fact table, is the user interface to the data warehouse.

Production Dimension Table

Product Key (PK)
Product Description
SKU Number (not to be used as a key in the warehouse)
Brand Description
Category Description
Department Description

Bringing together facts and dimensions becomes easier across all levels of business, as users agree that the dimensional model is their business. In our examples above, store operators use the store dimension table to relate to the number of products sold daily; marketing and product development refer to the attributes of the product dimension table to make conclusions about which products to promote and develop further.

Next I will relate common pitfalls known in the early stages of data warehousing and how to overcome them.

Until Next Time,