I want to populate a star schema / cube in SSIS / SSAS. I prepared all my dimension tables and my fact table, primary keys etc. The source is a 'flat' (item level) table and my problem is now how to split it up and get it from one into the respective tables. I did a fair bit of googling but couldn't find a satisfying solution to the problem. One would imagine that this is a rather common problem/situation in BI development?! Thanks, alexl

For a start, it depends on whether you want to do a simple initial data transfer or something more sophisticated (e.g. incremental). I'm going to assume you're doing an initial data transfer. Say your item table has columns as follows: <code>id, cat1, cat2, cat3, cat4, ...</code> Assuming categories 1-4 have columns <code>id, cat_name</code>, you can load dim_cat1 (the dimension table of item category 1) as follows: <pre class="prettyprint"><code>insert into dim_cat1 (cat_name) select distinct cat1 from item_table; </code></pre> You can do the same for all of the other categories/dimension tables. I'm assuming your dimension tables have automatically generated IDs. Now, to load the fact table: <pre class="prettyprint"><code>insert into fact_table (id, cat1_id, cat2_id, cat3_id, cat4_id, ...) select id, dc1.id from item_table it join dim_cat1 dc1 on dc1.cat_name = it.cat1 join dim_cat2 dc2 on dc2.cat_name = it.cat2 join dim_cat3 dc3 on dc3.cat_name = it.cat3 join dim_cat4 dc3 on dc4.cat_name = it.cat4 ... </code></pre> If you have a substantial amount of data, it might make sense to create indexes on the category names in the item_table and maybe the dimension tables. Btw, this is a database-independent answer, I don't work with SSIS/SSAS: you might have tools available which streamline parts of this process for you, but it's really not that difficult/time consuming to write in plain SQL.

Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

2 Answers

For a start, it depends on whether you want to do a simple initial data transfer or something more sophisticated (e.g. incremental). I'm going to assume you're doing an initial data transfer.

Say your item table has columns as follows: id, cat1, cat2, cat3, cat4, ... Assuming categories 1-4 have columns id, cat_name, you can load dim_cat1 (the dimension table of item category 1) as follows:

insert into dim_cat1 (cat_name)
  select distinct cat1 from item_table;

You can do the same for all of the other categories/dimension tables. I'm assuming your dimension tables have automatically generated IDs. Now, to load the fact table:

insert into fact_table (id, cat1_id, cat2_id, cat3_id, cat4_id, ...)
  select id, dc1.id
    from item_table it
      join dim_cat1 dc1 on dc1.cat_name = it.cat1
      join dim_cat2 dc2 on dc2.cat_name = it.cat2
      join dim_cat3 dc3 on dc3.cat_name = it.cat3
      join dim_cat4 dc3 on dc4.cat_name = it.cat4
 ...

If you have a substantial amount of data, it might make sense to create indexes on the category names in the item_table and maybe the dimension tables.

Btw, this is a database-independent answer, I don't work with SSIS/SSAS: you might have tools available which streamline parts of this process for you, but it's really not that difficult/time consuming to write in plain SQL.

146

answered Oct 18 '22 23:10

Tomislav Nakic-Alfirevic

We do this by using a dataflow task to copy information since the last package execution time into a temp staging tables, then update the archive/warehouse with data from those staging tables based on a key, then insert those rows which don't exist yet. Truncate the staging table ready for next time, add a load of auditing. Job Done?

answered Oct 19 '22 00:10

Mr Shoubs

Related questions
                            
                                How to remove duplicate rows with foreign keys dependencies?
                            
                                Do transaction in Django make things faster
                            
                                Locked object found on oracle.jdbc.driver.T4CConnection
                            
                                Aerospike Design | Request Flow Internals | Resources
                            
                                Cloud Firestore equivalent of AppEngine/Datastore Memcache and Keys-only queries?
                            
                                Same fields in most tables
                            
                                .Net inserting NULL values into SQL Server database from variable values
                            
                                Searching for a string 'somewhere' in a database
                            
                                How to create unique row ID in sharded databases?
                            
                                SQL - SELECT MAX() and accompanying field
                            
                                Implementing transactions over multiple databases
                            
                                Drop SQL Server Database
                            
                                Oracle optional relationship
                            
                                Very basic database concepts in C#
                            
                                How much is too much data for and XML file, and what are some file based database alternatives?
                            
                                Resetting auto-increment column back to 0 daily
                            
                                How often should I close database connections?
                            
                                user activity database structure
                            
                                Hierarchical Database, multiple tables or column with parent id?
                            
                                wanting to move up from ms access, thinking .net? visual studio? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

Tags:

database

etl

ssis

business-intelligence

ssas

alex25

People also ask

2 Answers

Tomislav Nakic-Alfirevic

Mr Shoubs

Recent Activity

Donate For Us