S3 -> Redshift cannot handle UTF8

Tags:

We have a file in S3 that is loaded in to Redshift via the COPY command. The import is failing because a VARCHAR(20) value contains an Ä which is being translated into .. during the copy command and is now too long for the 20 characters.

I have verified that the data is correct in S3, but the COPY command does not understand the UTF-8 characters during import. Has anyone found a solution for this?

202

asked Dec 22 '14 23:12

Elliot Chance

1 Answers

tl;dr

the byte length for your varchar column just needs to be larger.

Detail

Multi-byte characters (UTF-8) are supported in the varchar data type, however the length that is provided is in bytes, NOT characters.

AWS documentation for Multibyte Character Load Errors states the following:

VARCHAR columns accept multibyte UTF-8 characters, to a maximum of four bytes.

Therefore if you want the character Ä to be allowed, then you need to allow 2 bytes for this character, instead of 1 byte.

AWS documentation for VARCHAR or CHARACTER VARYING states the following:

... so a VARCHAR(120) column consists of a maximum of 120 single-byte characters, 60 two-byte characters, 40 three-byte characters, or 30 four-byte characters.

For a list of UTF-8 characters and their byte lengths, this is a good reference: Complete Character List for UTF-8

Detailed information for the Unicode Character 'LATIN CAPITAL LETTER A WITH DIAERESIS' (U+00C4) can be found here.

answered Sep 29 '22 09:09

Adrian Torrie

Related questions
                            
                                Set content type in S3 when attaching via Paperclip 4?
                            
                                Uploading multiple files to Amazon S3
                            
                                S3 parallel read and write performance?
                            
                                AWS Cloudfront distribution based on S3 bucket with cross-account objects getting Access denied
                            
                                Trouble Enabling CORS on S3
                            
                                Django 1.8 Cache busting + Amazon S3
                            
                                How to use the default AWS credentials chain for an S3 backed Maven repository in a Gradle build?
                            
                                AWS Socket Not created by this factory
                            
                                Access files in s3n://elasticmapreduce/samples/wordcount/input
                            
                                SSL problems with S3/AWS using the Java API: "hostname in certificate didn't match"
                            
                                How to copy file from one path to other using django-storages and amazon S3?
                            
                                How can I sync two amazon buckets using the AWS CLI?
                            
                                zsh: parse error near `\n' when Adding AWS keys as environment variables
                            
                                Are AWS S3 Event Notifications guaranteed to be delivered?
                            
                                How can I serve static web site from s3 through node expressjs?
                            
                                AWS Athena: use "folder" name as partition
                            
                                AWS credentials in Dockerfile
                            
                                Laravel 5 how to add prefix to S3 file storage config?
                            
                                Import data from URL to Amazon S3
                            
                                Amazon RedShift Copy Command

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

S3 -> Redshift cannot handle UTF8

Tags:

amazon-s3

amazon-redshift

paraccel