PDF
Print
E-mail
Written by John Vaughan
Friday, 10 July 2009 10:55

what is a data warehouse?

A data warehouse is the core of any Performance Management application.  If you like, it is the glue that binds it all together.  It takes data from all the disparate sources, massages it so it is presented in a consistent and usable form and presents it for use in a business intelligence environment.  It is not an absolutely vital cog in the wheel of a successful BI implementation, but without it, getting the most value out of your Performance Management investment will be difficult.

data warehousing basics

fact tables

Data warehouses use data from other systems and combine it all for use by analysis and reporting tools. The output is a set of data tables (called facts) and structural tables (called dimensions). For example, you might have a data, or fact, table for Sales, another for Inventory and another for Debtors.  In these fact tables we will normally store a column with reference a reference to each of the structure, or dimension, tables.  So in the Sales fact table, for example, we might have a Product reference, a Customer reference, Date reference and possibly a Sales Rep reference.

dimensions

These references in the Fact tables provide the links to the dimension tables where the rich and potentially highly detailed information on each of those subject areas is kept. If we think of the Product dimension table for example, we would normally keep all relevant information on products in this table.  This might include, for example, Product Description, Product Group ID, Product Group Description, Business Unit ID, Business Unit Description, Product Manager, Supplier etc.  Essentially, all the descriptive information about the products we sell but not any of the values associated with them, so no sales, orders, stock levels etc.

star schema

There are a couple of schools of thought then about how to treat the dimensions (those by Inmon and Kimball).  Our preferred method is to design what is called a Star Schema.  Here, if you can imagine the fact table is at the centre of the star and then the dimension table sit around the fact table, kind of like a star.

Sales Facts plus Relevant Dimensions

Obviously, we can then share dimensions from one fact table to another.  So, we would normally share the Customer and Time dimensions with both Sales facts and Debtor facts, or the Product dimension with both Sales and Inventory facts.

Sales and Debtor Facts plus Conformed Dimensions

Eventually we end up with a series of relevant facts connected to the relevant, or conformed, dimensions.

Multiple Facts with Conformed Dimensions

Other Decisions

There are various other concepts that need to be considered when contemplating a data warehouse and some of those include the need for slowly changing dimensions, how often you want to refresh the data, whether you need to use a master data management solution to manage your dimensions, whether you need an automated solution for completely building and deploying your data warehouse (like Wherescape Red), or should you use a tool like IBM InfoShere, or whether you can build it just using the tools available in your chosen database, like SQL Server.

why should your organisation have a data warehouse?

There are a number of reasons you should consider having a data warehouse in your organisation

Central Repository

Your organisation will get the most out of a data warehouse when it is established as the central repository of all your business information and used as the primary source data for decision making.  It should be fed by all your systems, ERP, CRM, Operational Forecasts, Corporate Planning, by data external to your organisation, be that from Customers, Suppliers, statutory bodies like the ABS or industry data such as that provided by IMS to the pharmaceutical industry.  This "single version of the truth" can then be used throughout your organisation and negate the need for complex Excel files, csv extracts and manipulation to arrive at the "truth".

Inter Application Reporting

This is one of the biggest reasons for staging your data into a data warehouse or data mart.  With data from all sources loaded you will be able to analyse information from multiple sources at once.  For example, you might like to analyse your sales data from your ERP combined with sales rep call data from your CRM.  What this allows is a full picture of what is happening in the business, rather than an application specific silo of information.

Source for Data Synchronisation

How many systems do you have with duplicated data?  Where, for example you store your customer's address.  This would undoubtedly be in your accounting system, it is probably also in your sales management system or CRM and it might be in a marketing database for example.  Your datawarehouse can be used as the synchronisation source for all of these so that if an address changes in one system, then it is updated in others.

Reduce ERP Licenses

Many companies use data warehouses to eliminate the need to purchase additional licenses of their primary ERP system where those licenses were going to be just used for reporting.  Why spend thousands of dollars on an additional ERP license when you could spend a fraction of that and offer your users a far richer environment for analysis and reporting, with all the information they need to manage their part of the business.

Reduce Transactional System Load and Improve User Performance

Does your business currently run complex reports direct from your transactional system?  Think of a sales report.  That needs to get data from (usually) sales order lines, sales order headers, a product table (for the product names), multiple product hierarchy tables (to summarise the report into business units etc), a customer table (for the customer name), multiple customer hierarchy tables (for channel information), date tables, rep tables etc.  Very quickly you can have a single report that is hitting 20 or more tables in your transactional system, each table requiring join and filter logic that will take up system resources on your ERP system and potentially long wait times for a result to be presented to users.  Having that same data staged into a data mart or data warehouse means that the queries and data structures are optimised for reporting, so they are delivered much faster, and that the extract of data can be managed at an appropriate time of your business day.

cognos_bilogo_wherescape

Last Updated ( Thursday, 30 July 2009 14:34 )