Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there downsides for creating a large VARCHAR value in Redshift?

The source data keeps throwing values for the field that keep getting bigger and bigger in length. Right now I'm using VARCHAR(200) but I might go for VARCHAR(400). Are there any disadvantages using a large number?

like image 641
simplycoding Avatar asked May 14 '15 23:05

simplycoding


People also ask

Does VARCHAR size matter Redshift?

Instead, consider the largest values you are likely to store in a VARCHAR column, for example, and size your columns accordingly. Because Amazon Redshift compresses column data very effectively, creating columns much larger than necessary has minimal impact on the size of data tables.

What is the max size for a VARCHAR in Redshift?

You can create an Amazon Redshift table with a TEXT column, but it is converted to a VARCHAR(256) column that accepts variable-length values with a maximum of 256 characters.

What are the limitations of Redshift?

Amazon Redshift doesn't support tables with column-level privileges for cross-database queries. Amazon Redshift doesn't support concurrency scaling for the queries that read data from other databases. Amazon Redshift doesn't support query catalog objects on AWS Glue or federated databases.


2 Answers

Don’t make it a practice to use the maximum column size for convenience.

Instead, consider the largest values you are likely to store in a VARCHAR column, for example, and size your columns accordingly. Because Amazon Redshift compresses column data very effectively, creating columns much larger than necessary has minimal impact on the size of data tables. During processing for complex queries, however, intermediate query results might need to be stored in temporary tables. Because temporary tables are not compressed, unnecessarily large columns consume excessive memory and temporary disk space, which can affect query performance.

http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-smallest-column-size.html

like image 148
Yuri Levinsky Avatar answered Oct 26 '22 17:10

Yuri Levinsky


What do you mean "downside"? There is a really big downside if you don't make the column big enough -- you can't use it to store the values you want to store there.

As for additional overhead, you don't need to worry about that. A varchar() type basically only takes up the storage needed for the value, plus a small overhead for the length. Also, "400" is not such a big number, especially when compared to "200".

So, if you need 400 bytes to store the value, change the table to store it. There may be overhead for changing the length of the value. I'm not sure if RedShift will feel the need to copy the data because the type changed. However, the effect on performance should be negligible.

like image 24
Gordon Linoff Avatar answered Oct 26 '22 19:10

Gordon Linoff