Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

audit table vs. Type 2 Slowly Changing Dimension

In SQL Server 2008+, we'd like to enable tracking of historical changes to a "Customers" table in an operational database.

It's a new table and our app controls all writing to the database, so we don't need evil hacks like triggers. Instead we will build the change tracking into our business object layer, but we need to figure out the right database schema to use.

The number of rows will be under 100,000 and number of changes per record will average 1.5 per year.

There are at least two ways we've been looking at modelling this:

  1. As a Type 2 Slowly Changing Dimension table called CustomersHistory, with columns for EffectiveStartDate, EffectiveEndDate (set to NULL for the current version of the customer), and auditing columns like ChangeReason and ChangedByUsername. Then we'd build a Customers view over that table which is filtered to EffectiveEndDate=NULL. Most parts of our app would query using that view, and only parts that need to be history-aware would query the underlying table. For performance, we could materialize the view and/or add a filtered index on EffectiveEndDate=NULL.

  2. With a separate audit table. Every change to a Customer record writes once to the Customer table and again to a CustomerHistory audit table.

From a quick review of StackOverflow questions, #2 seems to be much more popular. But is this because most DB apps have to deal with legacy and rogue writers?

Given that we're starting from a blank slate, what are pros and cons of either approach? Which would you recommend?

like image 460
Justin Grant Avatar asked Jun 20 '14 20:06

Justin Grant


People also ask

What is type 2 slowly changing dimension in data warehousing?

Data Warehousing > Concepts > Type 2 Slowly Changing Dimension In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.

What is SSIs slowly changing dimension type 2?

The SSIS Slowly Changing Dimension transformation coordinates the inserting and updating of records in data warehouse dimension tables. This transformation supports four types of changes, and in this article, we will explain SSIS Slowly Changing Dimension Type 2 (also called SCD Historical attribute or SCD 2).

What are the types of slowly changing dimension?

What are the types of SCD? Very simply, there are 6 types of Slowly Changing Dimension that are commonly used, they are as follows: Type 0 – Fixed Dimension No changes allowed, dimension never changes; Type 1 – No History Update record directly, there is no record of historical values, only current state; Type 2 – Row Versioning

What is the difference between Type 1 and Type 2 changes?

Type 1:changes are directly overwritten in the dimension table. Past values are not preserved but get lost Type 2: Past values for dimension data is kept using date from/to columns and/or a current row indicator There are more types and also combinations of types (hybrid types). You can find much more about slowly changing dimensions here .


1 Answers

In general, the issue with SCD Type- II is, if the average number of changes in the values of the attributes are very high, you end-up having a very fat dimension table. This growing dimension table joined with a huge fact table slows down the query performance gradually. It's like slow-poisoning.. Initially you don't see the impact. When you realize it, it's too late!

Now I understand that you will create a separate materialized view with EffectiveEndDate = NULL and that will be used in most of your joins. Additionally for you, the data volume is comparatively low (100,000). With average changes of only 1.5 per year, I don't think data volume / query performance etc. are going to be your problem in the near future.

In other words, your table is truly a slowly changing dimension (as opposed to a rapidly changing dimension - where your option #2 is a better fit). In your case, I will prefer option #1.

like image 130
hashbrown Avatar answered Sep 28 '22 08:09

hashbrown