Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert into a star-schema

I've read a lot about star-schema's, about fact/deminsion tables, select statements to quickly report data, however the matter of data entry into a star-schema seems aloof to me. How does one "theoretically" enter data into a star-schema db? while maintaining the fact table. Is a series of INSERT INTO statement within giant stored proc with 20 params my only option (and how to populate the fact table). Many thanks.

like image 545
shaun Avatar asked Mar 22 '10 23:03

shaun


People also ask

How do you load a fact table in star schema?

In order to load the fact table in this example, all the dimension tables have to be loaded first so that the surrogate keys are available. In the DMC, right-click an application directory in the navigation pane, select New and then Flow. The data flow opens in the right pane. Click the Process Flow tab.

Can we have two fact tables in a star schema?

The Star Schema Design Pattern. A star schema consists of one fact table, and one or more dimension tables. The fact table contains information about changes. In the most basic kind of fact table, each row represents a transaction.

Is star schema still relevant?

The star schema remains relevant no matter the size of your data, although small datasets are the most common when it comes to star schema modeling. The accessibility to simply query the data into facts and dimensions is intuitive and time-efficient.


2 Answers

Start with dimensions first -- one by one. Use ECCD (Extract, Clean, Conform, Deliver) approach.

Make sure that each dimension has a BusinessKey that uniquely identifies the "object" that a dimension row describes -- like email for a person.

With dimensions loaded, prepare key-lookup pipeline. In general, for each each dimension table you can prepare a key lookup table (BusinessKey, PrimaryKey). Some designers choose to lookup the dimension table directly, but the key-lookup can be often easily cached into memory which results in faster fact loading.

Use ECCD for fact data too. The ECC part happens in the staging area, you can choose (helper) tables or flat files for each step of the ECC, as you prefer.

While delivering fact tables, replace each BusinessKey in the fact row with the matching PrimaryKey that you get from a key-lookup table. Once all BusinessKeys are replaced with their matching PrimaryKeys, insert the row into the fact table.

Do not waste you time, use ETL tool. You can download Pentaho Kettle (community edition) for free -- it has everything one needs to achieve this.

like image 159
Damir Sudarevic Avatar answered Sep 18 '22 11:09

Damir Sudarevic


You typically do not insert data into a star schema in the same way you might into a normal form - i.e. with a stored procedure which inserts/updated all the appropriate tables within a single transaction. Remember that the star schema is typically a read-only denormalized model of data - it is (rarely) treated transactionally, and is typically loaded from data that is already denormalized flat - usually one flat file per star.

As Damir points out, typically, you load all the dimensions (handle the slowly changing etc), then load the facts, joining to the appropriate current dimensions to find the dimension IDs (using the business keys).

like image 40
Cade Roux Avatar answered Sep 16 '22 11:09

Cade Roux