Insert into a star-schema

Tags:

I've read a lot about star-schema's, about fact/deminsion tables, select statements to quickly report data, however the matter of data entry into a star-schema seems aloof to me. How does one "theoretically" enter data into a star-schema db? while maintaining the fact table. Is a series of INSERT INTO statement within giant stored proc with 20 params my only option (and how to populate the fact table). Many thanks.

545

asked Mar 22 '10 23:03

shaun

2 Answers

Start with dimensions first -- one by one. Use ECCD (Extract, Clean, Conform, Deliver) approach.

Make sure that each dimension has a BusinessKey that uniquely identifies the "object" that a dimension row describes -- like email for a person.

With dimensions loaded, prepare key-lookup pipeline. In general, for each each dimension table you can prepare a key lookup table (BusinessKey, PrimaryKey). Some designers choose to lookup the dimension table directly, but the key-lookup can be often easily cached into memory which results in faster fact loading.

Use ECCD for fact data too. The ECC part happens in the staging area, you can choose (helper) tables or flat files for each step of the ECC, as you prefer.

While delivering fact tables, replace each BusinessKey in the fact row with the matching PrimaryKey that you get from a key-lookup table. Once all BusinessKeys are replaced with their matching PrimaryKeys, insert the row into the fact table.

Do not waste you time, use ETL tool. You can download Pentaho Kettle (community edition) for free -- it has everything one needs to achieve this.

159

answered Sep 18 '22 11:09

Damir Sudarevic

You typically do not insert data into a star schema in the same way you might into a normal form - i.e. with a stored procedure which inserts/updated all the appropriate tables within a single transaction. Remember that the star schema is typically a read-only denormalized model of data - it is (rarely) treated transactionally, and is typically loaded from data that is already denormalized flat - usually one flat file per star.

As Damir points out, typically, you load all the dimensions (handle the slowly changing etc), then load the facts, joining to the appropriate current dimensions to find the dimension IDs (using the business keys).

answered Sep 16 '22 11:09

Cade Roux

Related questions
                            
                                Encrypt the data in a column in SQL Server
                            
                                Error : Object of class CI_DB_mysql_result could not be converted to string
                            
                                Does the CAP theorem imply that ACID is not possible for distributed databases?
                            
                                Hibernate with two foreign keys from same table- annotation
                            
                                Create new database in php with SQLite3
                            
                                how can I safely backup a huge database?
                            
                                Node PostgreSQL timeout a query by the client
                            
                                Entity Framework Database First many-to-many
                            
                                Swapping records' values for a column with a UNIQUE constraint in PostgreSQL
                            
                                Changes of product price in database design
                            
                                What does flushing the database mean? Also with "flash"
                            
                                How to add local database file to Visual Studio Mac 2017
                            
                                Q: How to set --set-gtid-purged=OFF as a default Export parameter in Mysql workbench?
                            
                                Split string in Laravel Framework
                            
                                How to upgrade database schema built with an ORM tool?
                            
                                How to query range of data in DB2 with highest performance?
                            
                                What does database query and insert speed depend on?
                            
                                How to automate functional/integration tests and database rollbacks
                            
                                Complex SQL design book [closed]
                            
                                How to get a list of users for all instance's databases

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Insert into a star-schema

Tags:

database

data-warehouse

star-schema

shaun

People also ask

2 Answers

Damir Sudarevic

Cade Roux

Recent Activity

Donate For Us