Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do Redshift column encodings affect query execution speed?

When creating data tables in Amazon Redshift, you can specify various encodings such as MOSTLY32 or BYTEDICT or LZO. Those are the compressions used when storing the columnar values on disk.

I am wondering if my choice of encoding is supposed to make a difference in query execution times. For example, if I make a column BYTEDICT would that make a difference over LZO when it comes to SELECTs, GROUP BYs or FILTERs?

like image 779
Mendhak Avatar asked Dec 20 '22 13:12

Mendhak


1 Answers

Yes. The compression encoding used translates to amount of disk storage. Generally, the lower the storage the better would be query performance.

But, which encoding would be be more beneficial to you depends on your data type and its distribution. There is no gurantee that LZO will always be better than Bytedict or vice-a-versa. In my experience, I usually load some sample data in the intended table. Than do a analyze compression. Now whatever Redshift suggests, I go with it. That has worked for me.

like image 90
Rakesh Singh Avatar answered Jan 13 '23 12:01

Rakesh Singh