What does the column skew_sorkey1 in Amazon Redshift's svv_table_info imply?

Tags:

amazon-redshift

Redshift's documentation (http://docs.aws.amazon.com/redshift/latest/dg/r_SVV_TABLE_INFO.html) states that the definition of the column skew_sortkey1 is - Ratio of the size of the largest non-sort key column to the size of the first column of the sort key, if a sort key is defined. Use this value to evaluate the effectiveness of the sort key.

What does this imply? What does it mean if this value is large? or alternatively small?

Thanks!

606

asked Mar 08 '16 01:03

elvikingo

2 Answers

A large skew_sortkey1 value means that the ratio of the size of largest non-sort key column to the first column of sort key is large which means row offsets in one disk block for the sort key corresponds to more disk blocks in the data column.

For example lets say skew_sortkey1 value is 5 for a table. Now the row offsets in one disk block for the sort key corresponds to 5 disk blocks for other data columns. Zone map stores the min and max value for the sort key disk block, so when you query this table with a where clause on sort key redshift identifies the sort key block which contains this data (block min < where clause value < block_max) and fetches the row offsets for that column. Now since the skew_sortkey1 is 5, it has to fetch 5 blocks for the data columns before filtering the records to the desired ones.

So to conclude having a high skew_sortkey1 value is not desirable.

189

answered Nov 09 '22 19:11

ab_

Sortkeys define the order in which each field of a table row are stored in a disk block of redshift. This means that column data belonging to a sort key region gets stored together in a single disk block (1 MB size) . Since redshift applies compression to different columns, sortkey columns would have a potential advantage of storing similar data within the same disk block, which leads to higher compression/more efficient storage of data. The same thing cannot be said about other non-sortkey columns.

The column skew_sortkey1 in SVV_TABLE_INFO quantifies the effectiveness of the first sort key within the table. The returned value allows a user to determine whether the selected sort key has improved the compression/efficiency of data storage.

answered Nov 09 '22 18:11

rahulbmv

Related questions
                            
                                How to know the type of a value returned by a Redshift query?
                            
                                What's the best way to create RFC-4180-friendly CSV files from Amazon Redshift UNLOAD?
                            
                                How can I create on-demand reports once they become too slow for our DB?
                            
                                Main causes for leader node to be at high CPU
                            
                                Escaping delimiter in Amazon Redshift COPY command
                            
                                Redshift : could not complete because of conflict with concurrent transaction
                            
                                How to select tables from different databases in Redshift
                            
                                Inaccurate COUNT DISTINCT Aggregation with Date dimension in Google Data Studio
                            
                                Long runtime when query is executed the first time in RedShift
                            
                                AWS RedShift - .NET Core (ODBC Support?)
                            
                                Can AWS Redshift drop a table that is wrapped in transaction?
                            
                                Redshift - Adding a column, do we have to change our previous CSVs to include it?
                            
                                Using RedShift as an additional Django Database
                            
                                How to do dynamic regex matching, in redshift?
                            
                                Using CTE and Update in Redshift
                            
                                Unable to create cluster. VPC is greyed out when launching Redshift Cluster
                            
                                MongoDB into AWS Redshift
                            
                                Synchronize data from MySql to Amazon RedShift
                            
                                Version control for Tableau

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With