Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text files uploaded to S3 are encoded strangely?

This is the strangest error, and I don't even know where to start understanding what's wrong.

S3 has been working well, up until suddenly one day (yesterday) it strangely encodes any text file uploaded to strange characters. Whenever a text file has Å, Ä, Ö or any other UTF-8 comparable but none English characters, the text file is messed up. I've tried uploading using various clients, as well as the web interface of AWS. The upload goes well, then I download the file and it's messed up. I've tried downloading it to my Mac, I've tried downloading it onto a Raspberry with Linux on it. Same error.

Is there any encoding done by Amazons S3 servers?!

like image 235
Paolo Avatar asked Mar 14 '14 10:03

Paolo


People also ask

How are TXT files encoded?

Most Microsoft Windows text files use "ANSI", "OEM", "Unicode" or "UTF-8" encoding.

What file format does S3 use?

Valid formats are CSV , TSV , CLF , ELF , and JSON . The default value is CSV . delimiter – (Optional) Specify the file field delimiter. This must map to the file type specified in the format field.

Does S3 overwrite file with same name?

By default, when you upload the file with same name. It will overwrite the existing file. In case you want to have the previous file available, you need to enable versioning in the bucket.

What is the largest size file you can transfer to S3?

Individual Amazon S3 objects can range in size from a minimum of 0 bytes to a maximum of 5 TB. The largest object that can be uploaded in a single PUT is 5 GB.


3 Answers

I had the same problem and I solved it by adding charset=utf-8 in properties -> metadata of the file

enter image description here

like image 194
Toni Chaz Avatar answered Oct 09 '22 09:10

Toni Chaz


You can explicitly set the "Content-Type: text/plain; charset=utf-8", on the file in the S3 console.

This will tell S3 to serve as text.

like image 26
Sony Kadavan Avatar answered Oct 09 '22 09:10

Sony Kadavan


For those who are using boto3 (python 3) to upload and are having strange characters instead of accentuation (such as in portuguese and french languages, for example), Toni Chaz's and Sony Kadavan's answers gave me the hint to fix. Adding ";charset=utf-8" to ContentType argument when calling put_object was enough to the accentuation be shown correctly.

content_type="text/plain;charset=utf-8"
bucket_obj.put_object(Key=key, Body=data, ContentType=content_type)
like image 29
Raphael Fernandes Avatar answered Oct 09 '22 10:10

Raphael Fernandes