Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redshift: Truncate VARCHAR value automatically on INSERT or maybe use max length?

When performing an INSERT, Redshift does not allow you to insert a string value that is longer/wider than the target field in the table. Observe:

CREATE TEMPORARY TABLE test (col VARCHAR(5));
-- result: 'Table test created'

INSERT INTO test VALUES('abcdefghijkl');
-- result: '[Amazon](500310) Invalid operation: value too long for type character varying(5);'

One workaround for this is to cast the value:

INSERT INTO test VALUES('abcdefghijkl'::VARCHAR(5));
-- result: 'INSERT INTO test successful, 1 row affected'

The annoying part about this is that now all of my code will have to have these cast statements on every INSERT for each VARCHAR field like this, or the application code will have to truncate the string before trying to construct the query; either way, it means that the column's width specification has to go into the application code, which is annoying.

Is there any better way of doing this with Redshift? It would be great if there was some option to just have the server truncate the string and perform (and maybe raise a warning) the way it does with MySQL.

One thing I could do is just declare these particular fields as a very large VARCHAR, perhaps even 65535 (the maximum).

create table analytics.testShort (a varchar(3));
create table analytics.testLong (a varchar(4096));
create table analytics.testSuperLong (a varchar(65535));

insert into analytics.testShort values('abc'); 
insert into analytics.testLong values('abc');
insert into analytics.testSuperLong values('abc');

-- Redshift reports the size for each table is the same, 4 mb

The one disadvantage of this approach I have found is that it will cause bad performance if this column is used in a group by/join/etc:

https://discourse.looker.com/t/troubleshooting-redshift-performance-extensive-guide/326 (search for VARCHAR)

I am wondering though if there is no harm otherwise if you plan to never use this field in group by, join, and the like.

Some things to note in my scenario: Yes, I really don't care about the extra characters that may be lost with truncation, and no, I don't have a way to enforce the length of the source text. I am capturing messages and URLs from external sources which generally fall into certain range in length of characters, but sometimes there are longer ones. It doesn't matter in our application if they get truncated or not in storage.

like image 491
olanmills Avatar asked Oct 14 '15 23:10

olanmills


People also ask

What is the max length of VARCHAR in Redshift?

TEXT and BPCHAR types You can create an Amazon Redshift table with a TEXT column, but it is converted to a VARCHAR(256) column that accepts variable-length values with a maximum of 256 characters.

Does VARCHAR size matter in Redshift?

Instead, consider the largest values you are likely to store in a VARCHAR column, for example, and size your columns accordingly. Because Amazon Redshift compresses column data very effectively, creating columns much larger than necessary has minimal impact on the size of data tables.

What are the limitations of Redshift?

Amazon Redshift doesn't support tables with column-level privileges for cross-database queries. Amazon Redshift doesn't support concurrency scaling for the queries that read data from other databases. Amazon Redshift doesn't support query catalog objects on AWS Glue or federated databases.


1 Answers

The only way to automatically truncate the strings to match the column width is using the COPY command with the option TRUNCATECOLUMNS

Truncates data in columns to the appropriate number of characters so that it fits the column specification. Applies only to columns with a VARCHAR or CHAR data type, and rows 4 MB or less in size.

Otherwise, you will have to take care of the length of your strings using one of these two methods:

  1. Explicitly CAST your values to the VARCHAR you want:

    INSERT INTO test VALUES(CAST('abcdefghijkl' AS VARCHAR(5)));

  2. Use the LEFT and RIGHT string functions to truncate your strings:

    INSERT INTO test VALUES(LEFT('abcdefghijkl', 5));

Note: CAST should be your first option because it handles multi-byte characters properly. LEFT will truncate based on the number of characters not bytes and if you have a multi-byte character in your string, you might end up exceeding the limit of your column.

like image 171
Ramy Nasr Avatar answered Sep 18 '22 04:09

Ramy Nasr