AWS Redshift has recently released their own new encoding format AZ64, for which they say:
Compared to ZSTD encoding, AZ64 consumed 5–10% less storage, and was 70% faster
When I use an ANALYZE COMPRESSION my_table
I still receive ZSTD
as an encoding format for all of its columns.
So is it really recommended as an encoding format over ZSTD? Shall I naively prefer AZ64 whenever possible to use it?
ANALYZE COMPRESSION is an advisory tool and doesn't modify the column encodings of the table. You can apply the suggested encoding by recreating the table or by creating a new table with the same schema. Recreating an uncompressed table with appropriate encoding schemes can significantly reduce its on-disk footprint.
A compression encoding specifies the type of compression that is applied to a column of data values as rows are added to a table. ENCODE AUTO is the default for tables. Amazon Redshift automatically manages compression encoding for all columns in the table.
Text255 works well with VARCHAR columns in which the same words recur.
PDFRSS. LZO encoding provides a very high compression ratio with good performance. LZO encoding works especially well for CHAR and VARCHAR columns that store very long character strings. They are especially good for free-form text, such as product descriptions, user comments, or JSON strings.
I got a reply from AWS Support on this question:
About your question preferring AZ64 instead of ZSTD were possible, yes you can do it.
Given AZ64 provide better performance compared to ZSTD
For further exaplanation:
Yes, AZ64 is better than ZSTD. It has comparable compression, when compared with ZSTD but greatly better performance, which is something you would have already learned using. As of now
ANALYZE COMPRESSION
command doesn't support AZ64, also I don't have an ETA on when AZ64 will be available withANALYZE COMPRESSION
. I will suggest you to keep an eye on
- https://docs.aws.amazon.com/redshift/latest/mgmt/rs-mgmt-cluster-version-notes.html
- https://aws.amazon.com/redshift/whats-new/
for any updates on AWS Redshift. I have verified this with the internal service team.
ANALYZE COMPRESSION
is an advisory tool it recommends the optimal column encoding depending upon the columns.
When ZSTD first came out it also took a while to be added to the analyze compression
command.
ZSTD can be used on any datatype although some won't benefit from it as much as others. You can naively apply it to everything and it works fine.
AZ64 can only be applied to these datatypes:
SMALLINT
INTEGER
BIGINT
DECIMAL
DATE
TIMESTAMP
TIMESTAMPTZ
I ran an experiment to test the compression factor. I was surprised to discover it doesn't always make things smaller.
create table
DDL for the original tableVACUUM FULL <tablename> TO 99 PERCENT
for both old and new tableANALYZE <tablename>
for both old and new tableQuery I used to check column sizes borrowed from https://stackoverflow.com/a/33388886/1335793
id
column is a primary key, so has a very large cardinality, perhaps that helps?Edit: I haven't done any performance comparison so this is only part of the story. Overall the size of the table is smaller, even if some fields weren't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With