Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AZ64 compression format performance

AWS Redshift has recently released their own new encoding format AZ64, for which they say:

Compared to ZSTD encoding, AZ64 consumed 5–10% less storage, and was 70% faster

When I use an ANALYZE COMPRESSION my_table I still receive ZSTD as an encoding format for all of its columns.

So is it really recommended as an encoding format over ZSTD? Shall I naively prefer AZ64 whenever possible to use it?

like image 383
Vzzarr Avatar asked Nov 21 '19 15:11

Vzzarr


People also ask

How do you analyze compression?

ANALYZE COMPRESSION is an advisory tool and doesn't modify the column encodings of the table. You can apply the suggested encoding by recreating the table or by creating a new table with the same schema. Recreating an uncompressed table with appropriate encoding schemes can significantly reduce its on-disk footprint.

What is compression encoding?

A compression encoding specifies the type of compression that is applied to a column of data values as rows are added to a table. ENCODE AUTO is the default for tables. Amazon Redshift automatically manages compression encoding for all columns in the table.

Which encoding works well for CHAR and VARCHAR columns with very small as well as long strings?

Text255 works well with VARCHAR columns in which the same words recur.

What is Lzo encoding?

PDFRSS. LZO encoding provides a very high compression ratio with good performance. LZO encoding works especially well for CHAR and VARCHAR columns that store very long character strings. They are especially good for free-form text, such as product descriptions, user comments, or JSON strings.


2 Answers

I got a reply from AWS Support on this question:

TL;DR

About your question preferring AZ64 instead of ZSTD were possible, yes you can do it.

Given AZ64 provide better performance compared to ZSTD

For further exaplanation:

Yes, AZ64 is better than ZSTD. It has comparable compression, when compared with ZSTD but greatly better performance, which is something you would have already learned using. As of now ANALYZE COMPRESSION command doesn't support AZ64, also I don't have an ETA on when AZ64 will be available with ANALYZE COMPRESSION. I will suggest you to keep an eye on

  • https://docs.aws.amazon.com/redshift/latest/mgmt/rs-mgmt-cluster-version-notes.html
  • https://aws.amazon.com/redshift/whats-new/

for any updates on AWS Redshift. I have verified this with the internal service team.

ANALYZE COMPRESSION is an advisory tool it recommends the optimal column encoding depending upon the columns.

like image 189
Vzzarr Avatar answered Oct 17 '22 18:10

Vzzarr


When ZSTD first came out it also took a while to be added to the analyze compression command.

ZSTD can be used on any datatype although some won't benefit from it as much as others. You can naively apply it to everything and it works fine.

AZ64 can only be applied to these datatypes:

SMALLINT
INTEGER
BIGINT
DECIMAL
DATE
TIMESTAMP
TIMESTAMPTZ

I ran an experiment to test the compression factor. I was surprised to discover it doesn't always make things smaller.

Steps

  1. Generated the create table DDL for the original table
  2. Changed the name of the table and the encoding for valid columns
  3. created the table Inserted into the new table from the old table ran
  4. VACUUM FULL <tablename> TO 99 PERCENT for both old and new table
  5. ran ANALYZE <tablename> for both old and new table

Query I used to check column sizes borrowed from https://stackoverflow.com/a/33388886/1335793

Results

enter image description here

  • The id column is a primary key, so has a very large cardinality, perhaps that helps?
  • The sort_order column has values in the range 0-50 with more values closer to 0
  • The created_at timestamp ranges over many years with more data in recent times
  • completed_step is similar to sort order but the median is closer to 0

Edit: I haven't done any performance comparison so this is only part of the story. Overall the size of the table is smaller, even if some fields weren't.

like image 39
Davos Avatar answered Oct 17 '22 18:10

Davos