How to handle Slowly Changing Dimension Type 2 in Redshift?

Tags:

amazon-redshift

I want to track username changes overtime.

I have the following users table in Redshift:

id     username     valid_from     valid_to     current    
--------------------------------------------------------
1      joe1         2015-01-01     2015-01-15   No
1      joe2         2015-01-15     NULL         Yes

My source data is from RDS Postgres. I'm thinking of several options on how to handle this:

1) Create users_history table and start tracking this inside RDS Postgres db. This requires me making changes to my app and this table potentially can get huge

2) Have an ETL process and query the users source table like every 5 minutes looking for new changes (sort by last updated_at) and dump it to DynamoDB.

3) Have an ETL process dumping data to S3, then COPY it into a temporary table inside Redshift and doing query update there

Can you help advice what is scalable and easily maintainable in the long run? Remember these tables can be massive and I'll be tracking SCD for many tables.

Thanks.

Update 1: I chatted with AWS support and they showed me this, seems like a good solution: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html

262

asked Dec 15 '15 19:12

1 Answers

In terms of SQL/ ETL implementation, Redshift supports anything that an RDS will support. So, you decision should be based on what are the constraints and expectations from the database.

Redshift is a read optimized system, so updates every few minutes will likely slow it down for query purposes. (Micro-ETLs are not much recommended on Redshift)

On the other hand, if you are likely to have huge tables, Redshift will perform better that most row-store databases (like MySQL, Postgre etc.). This delta in performance will increase with the growth of your data size as Redshift is designed for bigger scales than traditional systems.

189

answered Oct 20 '22 22:10

Paladin

Related questions
                            
                                docker-entrypoint-initdb.d bad interpreter: Permission denied
                            
                                Read/write Postgres large objects with DBI & RPostgres
                            
                                python api for postgresql pg_dump*, restore commands
                            
                                Postgres Plus Cloud Database vs Amazon Relational Database Service (Amazon RDS)
                            
                                Postgresql: Backup and restore some tables with primary keys
                            
                                How to create a unique lowercase functional index using SQLAlchemy on PostgreSQL?
                            
                                Extract the user-defined Error message from exception
                            
                                PL/Proxy returning Unsupported Type on Stored Procedure Call
                            
                                How to avoid deadlocks in Postgres?
                            
                                Failover @Type for h2?
                            
                                How to upgrade a postgres database from 9.3 to 9.4 on heroku?
                            
                                How do I manage a Non-default Django database
                            
                                How to set tablespace for @Entity?
                            
                                How can I optimize this PostgreSQL query that updates every row?
                            
                                Postgres SQL to query array text[] in specific element
                            
                                how to search and sort data but exclude prefix words in sql
                            
                                grant usage & privileges on future created schema in PostgreSQL
                            
                                psql shell command execution with \!
                            
                                pgAdmin3 backup over ssh tunnel
                            
                                Cascade save model not saving relationship

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to handle Slowly Changing Dimension Type 2 in Redshift?

Tags:

postgresql

amazon-redshift

Kien Pham

People also ask

1 Answers

Paladin

Recent Activity

Donate For Us