Skip to main content

“Data must be managed properly before a data warehouse can provide a real return on investment”

— Stan Becton, President of Acton Burnell Inc.
Why you need a data warehouse (DW)

A data warehouse is a necessary foundation for a true Business Intelligence solution. Almost all sophisticated, mature BI systems are based on one. An enterprise data warehouse should be a trusted single source of the truth.

How to build a data warehouse

How do you build a data warehouse is exactly what we intend to address in this article.

Requirements Gathering

15%

If you are at this point, you have realized the fact you are needing a data warehouse (DW).  A large part of building a successful DW starts here at the requirements phase.

Don’t worry if you don’t capture everything because you won’t. I’ve worked with some of the best developers, architects, and managers and there has always been a situation where we hear, “ah, we did not think of that”. Focus on the big items and be able to accept changes as needed.

Design & Architecture

35%

Data warehousing systems, like home designs, have many different architectural options.

At this point, we should be considering the architect of the data warehouse. There are three options, however, there is one that is widely used and considered best practice. Let’s review them all.

Single-tier architecture – The objective of a single layer is to minimize the amount of data stored. This goal is to remove data redundancy. This architecture is not frequently used.

Two-tier architecture – The Two-layer architecture separates physically available sources and data warehouse. This architecture is not expandable and does not support a large number of end-users.

Three-tier architecture – This is the most widely used architecture. It consists of the Top, Middle and Bottom Tier.

  1. Bottom Tier: The database of the Data warehouse serves as the bottom tier. It is usually a relational database system. Data is cleansed, transformed, and loaded into this layer using back-end tools.
  2. Middle Tier: The middle tier is an OLAP server that is typically implemented using either a relational OLAP (ROLAP) model or a multidimensional OLAP (MOLAP) model.
  3. Top-Tier: The Top-Tier is a front-end client layer, which contains query and reporting tools, analysis tools, and/or data mining tools (e.g., trend analysis, prediction, and so on)

In our Three-tier, we name the tiers as:

  1. Primary Layer as the bottom tier
  2. Conformed as the middle tier
  3. Presentation for the top tier

Another item to take into consideration is the design options such as Top-Down, Bottom-Up or the Hybrid approach. This is where we could get into some great conversations in regards to Bill Inmon’s approach or Ralph Kimball’s approach. Many agree that a combination of the two called the “Hybrid” is the best and most commonly used.

Development & Testing

80%

By this time, you have a good idea of your data warehouse and it’s design, architecture, and in the requirements phase, you have documented all the sources which will feed into the data warehouse. You will also have a technical document you can hand off to your data warehouse developers so that they get to work. This is a great time to get your team into sprint cycles. Take chunks of your technical document and create a project timeline. Tackling one organization’s data per sprint will allow your testers to get started while you are working on the next organization.

We like to follow these steps for modeling the data.

  1. Identify the business process/es you want to track
  2. Choose the granularity of the fact data
  3. Strip out the dimensions
  4. Consolidate the facts

We suggest that you also start creating SSIS templates for each of the Three-tier layers to help speed things along. We have written about this in the past and you can view more by taking a look at this link SSIS Standardization.

User Acceptance

90%

So far the above has just touched a few considerations you need for building your data warehouse. However, we left out some things that just get us out of scope for this article. One of those things we did not go into is data governance. A data steward would be helpful at this point to verify that the articles defined in the data governance were followed. A good test plan should have been created by this point.

Deployment

100%

You nailed your sprint and the user acceptance/testing and you are now ready to deploy. We commonly use different environments for which we do our development, testing and production, so hopefully you used used a SSIS template utilizing package configurations. If you did use the package configurations, the process of switching between environments should be simple. There are many ways to deploy from your testing environment to production, so pick one that works best for you and stick with it. If you can, use a tool for checking in and checking out your SSIS packages (ie. Team Foundation).

One last note, everything has been tested to this point, however, we like to do one last step before giving the sprint our final stamp of approval and thus scheduling the SSIS packages to run automatically. This last step includes testing the slowly changing dimensions (SCD). It’s a simple thing to do and you will sleep better if you do it.

Congratulations on your sprint release to production. It’s time to reward your team and prepare for the next sprint.

We Build Data Warehouses

Like Nothing You've Seen. Contact Us Today & Let Us Build You A Scalable Data Warehouse

Get in Touch