I'm trying to figure out the different types of sortkeys in Amazon Redshift and I encountered a strange warning here, which is not explained: <blockquote> Important: Don’t use an interleaved sort key on columns with monotonically increasing attributes, such as identity columns, dates, or timestamps. </blockquote> And yet, in their own example, Amazon uses interleaved key on a date column with good performance. So, my question is - what's the explanation to this warning and should I take it seriously? More precisely - is there a problem with using interleaved key over a timestamp column?

From https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html <blockquote> As you add rows to a sorted table that already contains data, the unsorted region grows, which has a significant effect on performance. The effect is greater when the table uses interleaved sorting, especially when the sort columns include data that increases monotonically, such as date or timestamp columns. </blockquote> The key point in the original quote is not that that data is a date or timestamp, it's that it increases "monotonically", which in this context presumably means increasing sequentially such as an event timestamp or an Id number.

Why not to use timestamp with Interleaved Sortkey?

Video Answer

2 Answers

I think it might have been explained later on when they describe issues around vacuuming/reindexing:

When tables are initially loaded, Amazon Redshift analyzes the distribution of the values in the sort key columns and uses that information for optimal interleaving of the sort key columns. As a table grows, the distribution of the values in the sort key columns can change, or skew, especially with date or timestamp columns. If the skew becomes too large, performance might be affected.

So if that is the only reason, then it just means you will have increased maintenance on index.

121

answered Oct 14 '22 21:10

Łukasz Kamiński

From https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data.html

As you add rows to a sorted table that already contains data, the unsorted region grows, which has a significant effect on performance. The effect is greater when the table uses interleaved sorting, especially when the sort columns include data that increases monotonically, such as date or timestamp columns.

The key point in the original quote is not that that data is a date or timestamp, it's that it increases "monotonically", which in this context presumably means increasing sequentially such as an event timestamp or an Id number.

answered Oct 14 '22 19:10

Nathan Griffiths

Related questions
                            
                                How to change column ordering in Amazon Redshift
                            
                                Escaping single quotes in REDSHIFT SQL
                            
                                How to pipe data from AWS Postgres RDS to S3 (then Redshift)?
                            
                                Return elements of Redshift JSON array on separate rows
                            
                                generate_series() method fails in Redshift
                            
                                PostgreSQL get latest rows/events for all users
                            
                                How can I copy an IDENTITY field?
                            
                                AWS Datapipeline RedShiftCopyActivity - how to specify "columns"
                            
                                How to properly provide credentials for spark-redshift in EMR instances?
                            
                                Redshift + SQLAlchemy long query hangs
                            
                                Redshift time-series table loading questions
                            
                                Amazon Redshift-Backup & Restore best practices?
                            
                                Can I put a condition on a window function in Redshift?
                            
                                How to count different values into different rows in SQL efficiently?
                            
                                Where to run the copy command for Amazon Redshift
                            
                                Delay execution of SQL script in Amazon Redshift
                            
                                SQL check if value exists in a partition using CASE WHEN without any JOIN
                            
                                Redshift: serializable isolation error (1023) despite LOCK

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why not to use timestamp with Interleaved Sortkey?

Tags:

amazon-redshift

senior_citizen_

People also ask

Video Answer

2 Answers

Łukasz Kamiński

Nathan Griffiths

Recent Activity

Donate For Us