Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transforming OLTP Relational Database to Data Warehousing Model

What are the common design approaches taken in loading data from a typical Entity-Relationship OLTP database model into a Kimball star schema Data Warehouse/Marts model?

  • Do you use a staging area to perform the transformation and then load into the warehouse?
  • How do you link data between the warehouse and the OLTP database?
  • Where/How do you manage the transformation process - in the database as sprocs, dts/ssis packages, or SQL from application code?
like image 807
Russ Cam Avatar asked May 15 '09 11:05

Russ Cam


People also ask

Is OLTP supported by data warehouse?

A data warehouse provides an OLTP system by supporting a place for the OLTP database to offload data as it accumulates, and by supporting services that can complicate and degrade OLTP operations if they were implemented in the OLTP database.

Is data warehouse OLTP or OLAP?

Data Warehouse is the example of OLAP system. OLTP stands for On-Line Transactional processing. It is used for maintaining the online transaction and record integrity in multiple access environments. OLTP is a system that manages very large number of short online transactions for example, ATM.

What is OLTP in data warehouse?

OLTP defined OLTP or Online Transaction Processing is a type of data processing that consists of executing a number of transactions occurring concurrently—online banking, shopping, order entry, or sending text messages, for example.


2 Answers

Personally, I tend to work as follows:

  1. Design the data warehouse first. In particular, design the tables that are needed as part of the DW, ignoring any staging tables.
  2. Design the ETL, using SSIS, but sometimes with SSIS calling stored procedures in the involved databases.
  3. If any staging tables are required as part of the ETL, fine, but at the same time make sure they get cleaned up. A staging table used only as part of a single series of ETL steps should be truncated after those steps are completed, with or without success.
  4. I have the SSIS packages refer to the OLTP database at least to pull data into the staging tables. Depending on the situation, they may process the OLTP tables directly into the data warehouse. All such queries are performed WITH(NOLOCK).
  5. Document, Document, Document. Make it clear what inputs are used by each package, and where the output goes. Make sure to document the criteria by which the input are selected (last 24 hours? since last success? new identity values? all rows?)

This has worked well for me, though I admit I haven't done many of these projects, nor any really large ones.

like image 127
John Saunders Avatar answered Oct 11 '22 20:10

John Saunders


I'm currently working on a small/mid size dataware house. We're adopting some of the concepts that Kimball puts forward, i.e. the star scheme with fact and dimension tables. We structure it so that facts only join to dimensions (not fact to fact or dimension to dimension - but this is our choice, not saying it's the way it should be done), so we flatten all dimension joins to the fact table.

We use SSIS to move the data from the production DB -> source DB -> staging DB -> reporting DB (we probably could have have used less DBs, but that's the way it's fallen).

SSIS is really nice as it's lets you structure your data flows very logically. We use a combination of SSIS components and stored procs, where one nice feature of SSIS is the ability to provide SQL commands as a transform between a source/destination data-flow. This means we can call stored procs on every row if we want, which can be useful (albeit a bit slower).

We're also using a new SQL Server 2008 feature called change data capture (CDC) which allows you to audit all changes on a table (you can specify which columns you want to look at in those tables), so we use that on the production DB to tell what has changed so we can move just those records across to the source DB for processing.

like image 36
JonoW Avatar answered Oct 11 '22 21:10

JonoW