Structure within staging area of data warehouse

Tags:

We are working on a datawarehouse for a bank and have pretty much followed the standard Kimball model of staging tables, a star schema and an ETL to pull the data through the process.

Kimball talks about using the staging area for import, cleaning, processing and everything until you are ready to put the data into the star schema. In practice this typically means uploading data from the sources into a set of tables with little or no modification, followed by taking data optionally through intermediate tables until it is ready to go into the star schema. That's a lot of work for a single entity, no single responsibility here.

Previous systems I have worked on have made a distinction between the different sets of tables, to the extent of having:

Upload tables: raw source system data, unmodified
Staging tables: intermediate processing, typed and cleansed
Warehouse tables

You can stick these in separate schemas and then apply differing policies for archive/backup/security etc. One of the other guys has worked on a warehouse where there is a StagingInput and a StagingOutput, similar story. The team as a whole has a lot of experience, both datawarehouse and otherwise.

However, despite all this, looking through Kimball and the web there seems to be absolutely nothing in writing about giving any kind of structure to the staging database. One would be forgiven for believing that Mr Kimball would have us all work with staging as this big deep dark unstructured pool of data.

Whilst of course it is pretty obvious how to go about it if we want to add some more structure to the staging area, it seems very odd that there seems to be nothing written about it.

So, what is everyone else out there doing? Is staging just this big unstructured mess or do folk have some interesting designs on it?

998

asked May 14 '09 13:05

NeedHack

1 Answers

I have experienced the same problem. We have a large HR DataWarehouse and I am pulling data from systems all over the enterprise. I've got a nice collection of Fact and Dimension tables, but the staging area is a mess. I don't know of any standards for design of this. I would follow the same path you are on and come up with a standard set of names to keep things in order. Your suggestion is pretty good for the naming. I'd keep working with that.

answered Sep 21 '22 07:09

Christian Loris

Related questions
                            
                                Foreach loop vs while loop results
                            
                                How to calculate streak of days in core data?
                            
                                converting a struct to a json when querying athena
                            
                                Left join or select from multiple table using comma (,) [duplicate]
                            
                                Error- ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion
                            
                                update x set y = null takes a long time
                            
                                Select distinct combinations from two columns
                            
                                SQL MIN_ACTIVE_ROWVERSION() value does not change for a long while
                            
                                Turn Presto columns to rows via rotation
                            
                                Connection String Best Practices [closed]
                            
                                SQL Stairstep Query
                            
                                When to use float vs decimal
                            
                                Is there any trick that allows to use Management Studio's (ver. 2008) IntelliSense feature with earlier versions of SQL Server?
                            
                                TSQL - Disabling Triggers in Transactions
                            
                                Why is the foreign key part of the primary key in an identifying relationship?
                            
                                Insert an image in postgresql database
                            
                                Compare one query with multiple results in PHP
                            
                                Creating a table using explicit create table statement versus select into
                            
                                Finding and dealing with duplicate users
                            
                                Case insensitive index in postgres, handles case sensitive queries?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Structure within staging area of data warehouse

Tags:

sql

database-design

data-warehouse

NeedHack

People also ask

1 Answers

Christian Loris

Recent Activity

Donate For Us