07 March 2008

Data Warehousing 101: Requirements for a Data Warehouse

Yes, it has been a while since I added an entry in my blog. I have been working on creating a dynamic data warehouse system reliant on the traditional LAMP stack (and a very nifty graphical plug-in - please comment below if you wish to know what it is!).

Firstly, I must make these qualifications before you read further:
  1. I consider Bill Inmon and Ralph Kimball the pioneers of data warehousing;
  2. Data warehousing has been a relatively recent concept, since 1983 when Teradata introduced a database decision support system, and most current day concepts we understand about data warehousing have been taking shape over the past 10 years;
  3. I am new to this as well, but am very satisfied with the results I am obtaining.
OK... What does a data warehouse look like? This is a succinct high-level view as aptly described on the data warehouse entry on wikipedia:
  • Subject-oriented - data is organized so that all data elements relating to the same real-world event or object are linked together.
  • Time-variant -changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time.
  • Non-volatile - data in the database is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting.
  • Integrated - database contains data from most or all of an organization's operational systems and this data is made consistent.
This premise draws us to what appears to be a centralized solution for a data warehouse. However, you can approach it on a smaller scale by identifying the basic bones of your database structure, thus enabling you to produce results in a shorter span of time. Note: the building blocks of data is not done according to departments - for example, sales, marketing, customer service, and so on.

Instead, you can establish what your basis for your data is through fact tables and dimension tables.

A fact table is the "primary table where the numerical performance measurements of the business are stored" (The Data Warehouse Toolkit, second edition, p.16). The most useful facts of this table are numeric and additive.

Daily Sales Fact Table

Date Key (FK)
Product Key (FK)
Store Key (FK)
Quantity Sold
Dollar Sales Amount
A dimension table, the companion to the fact table, is the user interface to the data warehouse.

Production Dimension Table

Product Key (PK)
Product Description
SKU Number (not to be used as a key in the warehouse)
Brand Description
Category Description
Department Description
...more


Bringing together facts and dimensions becomes easier across all levels of business, as users agree that the dimensional model is their business. In our examples above, store operators use the store dimension table to relate to the number of products sold daily; marketing and product development refer to the attributes of the product dimension table to make conclusions about which products to promote and develop further.

Next I will relate common pitfalls known in the early stages of data warehousing and how to overcome them.

Until Next Time,
David

No comments: