The source data keeps throwing values for the field that keep getting bigger and bigger in length. Right now I'm using VARCHAR(200) but I might go for VARCHAR(400)
. Are there any disadvantages using a large number?
Instead, consider the largest values you are likely to store in a VARCHAR column, for example, and size your columns accordingly. Because Amazon Redshift compresses column data very effectively, creating columns much larger than necessary has minimal impact on the size of data tables.
You can create an Amazon Redshift table with a TEXT column, but it is converted to a VARCHAR(256) column that accepts variable-length values with a maximum of 256 characters.
Amazon Redshift doesn't support tables with column-level privileges for cross-database queries. Amazon Redshift doesn't support concurrency scaling for the queries that read data from other databases. Amazon Redshift doesn't support query catalog objects on AWS Glue or federated databases.
Don’t make it a practice to use the maximum column size for convenience.
Instead, consider the largest values you are likely to store in a VARCHAR column, for example, and size your columns accordingly. Because Amazon Redshift compresses column data very effectively, creating columns much larger than necessary has minimal impact on the size of data tables. During processing for complex queries, however, intermediate query results might need to be stored in temporary tables. Because temporary tables are not compressed, unnecessarily large columns consume excessive memory and temporary disk space, which can affect query performance.
http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-smallest-column-size.html
What do you mean "downside"? There is a really big downside if you don't make the column big enough -- you can't use it to store the values you want to store there.
As for additional overhead, you don't need to worry about that. A varchar()
type basically only takes up the storage needed for the value, plus a small overhead for the length. Also, "400" is not such a big number, especially when compared to "200".
So, if you need 400 bytes to store the value, change the table to store it. There may be overhead for changing the length of the value. I'm not sure if RedShift will feel the need to copy the data because the type changed. However, the effect on performance should be negligible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With