When creating data tables in Amazon Redshift, you can specify various encodings such as MOSTLY32 or BYTEDICT or LZO. Those are the compressions used when storing the columnar values on disk.
I am wondering if my choice of encoding is supposed to make a difference in query execution times. For example, if I make a column BYTEDICT would that make a difference over LZO when it comes to SELECTs, GROUP BYs or FILTERs?
Yes. The compression encoding used translates to amount of disk storage. Generally, the lower the storage the better would be query performance.
But, which encoding would be be more beneficial to you depends on your data type and its distribution. There is no gurantee that LZO will always be better than Bytedict or vice-a-versa. In my experience, I usually load some sample data in the intended table. Than do a analyze compression. Now whatever Redshift suggests, I go with it. That has worked for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With