Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

snowflake is better than indexing?

Here is the problem, I have a sales information table which contains sales information, which has columns like (Primary Key ID, Product Name, Product ID, Store Name, Store ID, Sales Date). I want to do analysis like drill up and drill down on store/product/sales date.

There are two design options I am thinking about,

  1. Create individual index on columns like product name, product ID, Store Name, Store ID, Sales Date;
  2. Using data warehouse snowflake model, treating current sales information table as fact table, and create product, store, and sales date dimension table.

In order to have better analysis performance, I heard snowflake model is better. But why it is better than index on related columns from database design perspective?

thanks in advance, Lin

like image 852
Lin Ma Avatar asked Oct 22 '22 19:10

Lin Ma


1 Answers

Knowing your app usage patterns and what you want to optimize for are important. Here are a few reasons (among many) to choose one over the other.

Normalized Snowflake PROs:

Faster queries and lower disk and memory requirements. Due to each normalized row having only short keys rather than longer text fields, your primary fact table becomes much smaller. Even when an index is used (unless the query can be answered directly by the index itself), partial table scans are often required, and smaller data means fewer disk reads and faster access.

Easier modifications and better data integrity. Say a store changes its name. In snowflake, you change one row, whereas in a large denormalized table, you have to change it every time it comes up, and you will often end up with spelling errors and multiple variations of the same name.

Denormalized Wide Table PROs:

Faster single record loads. When you most often load just a single record or small number of records, having all your data together in one row will incur only a single cache miss or disk read, whereas in the snowflake the DB might have to read from multiple tables in different disk locations. This is more like how NoSQL databases store their "objects" associated with a key.

like image 81
dkamins Avatar answered Oct 27 '22 22:10

dkamins