Skip to content

Successful BI project – Step 3 – Architecture

2011/09/28

The model below is a schematic description of the refining process steps that source data pass through on its way to the end user. The name of the steps indicates their main purpose and responsibility of the components which are active in each step.

Solution overview

Source

To make it easier to read source data there has to be rules defining how and when data will be sent to the Data Warehouse. There also has to be rules about how the information itself should look, column names, formats etc. Those rules must be set based on the needs and requirements present.

A person responsible for the source system should also be selected and that person should see to that the rules are followed and that the developers get the information they need. To get the correct data delivered a source system analysis must be performed with a person having good knowledge about the source system.

My recommendation is to let the source system “deliver” data as opposed to letting the BI-system fetch data directly from the source system. I say this because the responsibility that the correct data is delivered should be on the source system owner. This also requires the source system to maintain the data extractions when changes in the source system occur.

ETL

To get a scalable and functional BI system there has to be an architecture supporting this. The solution I use consists of four different areas that in turn contain tables for storage of information for different information and purposes. I use the term areas because the implementation can look different depending on the platform you are on. The areas can for example be schemas or databases.

The picture below shows what it could look like.

Dragspel 

Source

This is the area where data should be delivered. It could be delivered as a text file on the file system or in a table in a database. Data in this area is stored exactly as the source system delivers it and isn’t changed.

Stage

Stage area is a temporary storage area used by the ETL process. For data to enter the stage area it has to pass defined rules. The rules can be that a field can’t be null, that a value must be in a certain span or that a date should be in a certain month. If the row is approved it can enter the stage area. Otherwise the row will be rejected and put in an error table along with a date and an error message.

Data Warehouse (Storage)

The Data Warehouse must be built to be scalable and flexible. Data from different source systems will be brought together in this area into a common structure. Fact (transactions) shall be stored at lowest level and the source systems natural key should be stored in a column as a unique key. This way you get a link back to the source system if needed.

My recommendation is not to let users access this area but to create a data mart for that purpose.

Star (Access)

Data in this area is stored in a star schema with dimensions and fact tables to make a foundation for reports and OLAP cubes and to be easily understandable for end users.

Dimensions have a simple primary key whereas fact tables have a composite key including the dimension keys. Data in fact tables should be aggregated to the level needed.

Configuration

Mappings and translations should be saved in a separate database. Mappings can be that one product from one system must be mapped to a product from another system. A translation can be from one product “P01” to a more understandable name like “Bicycle”. The reason for this area not being part of another area is that configuration data shall be persistent and that end users might change data in this area (i.e. Mappings and translation might change over time).

Function

The function area refers to the task of enabling different types of interactions with information from the Data Warehouse. In this area there are a bunch of suppliers, standards and tools available. Generally you can say that OLAP and standard reports are typical tools used but AdHoc query, data mining, simulation and Geographic information system (GIS) can be part of the function area.

Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: