What are the common design approaches taken in loading data from a typical Entity-Relationship OLTP database model into a Kimball star schema Data Warehouse/Marts model?
A data warehouse provides an OLTP system by supporting a place for the OLTP database to offload data as it accumulates, and by supporting services that can complicate and degrade OLTP operations if they were implemented in the OLTP database.
Data Warehouse is the example of OLAP system. OLTP stands for On-Line Transactional processing. It is used for maintaining the online transaction and record integrity in multiple access environments. OLTP is a system that manages very large number of short online transactions for example, ATM.
OLTP defined OLTP or Online Transaction Processing is a type of data processing that consists of executing a number of transactions occurring concurrently—online banking, shopping, order entry, or sending text messages, for example.
Personally, I tend to work as follows:
This has worked well for me, though I admit I haven't done many of these projects, nor any really large ones.
I'm currently working on a small/mid size dataware house. We're adopting some of the concepts that Kimball puts forward, i.e. the star scheme with fact and dimension tables. We structure it so that facts only join to dimensions (not fact to fact or dimension to dimension - but this is our choice, not saying it's the way it should be done), so we flatten all dimension joins to the fact table.
We use SSIS to move the data from the production DB -> source DB -> staging DB -> reporting DB (we probably could have have used less DBs, but that's the way it's fallen).
SSIS is really nice as it's lets you structure your data flows very logically. We use a combination of SSIS components and stored procs, where one nice feature of SSIS is the ability to provide SQL commands as a transform between a source/destination data-flow. This means we can call stored procs on every row if we want, which can be useful (albeit a bit slower).
We're also using a new SQL Server 2008 feature called change data capture (CDC) which allows you to audit all changes on a table (you can specify which columns you want to look at in those tables), so we use that on the production DB to tell what has changed so we can move just those records across to the source DB for processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With