Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Files uploaded to S3 with S3BotoStorage end up with invalidly escaped content-type meta data

FACEPALM UPDATE: Turns out I had forgotten/overlooked the fact that I was using an older fork of S3BotoStorage from https://github.com/gtaylor/django-athumb as my default storage (even though I had django-storages installed). The current version of django-storages doesn't suffer from this problem. The problem was that the content-type headers were unicode when they hit boto, and boto escapes unicode using urllib.quoteplus before sending it on to AWS. This isn't really Boto's fault since headers have to be converted to non-unicode strings somehow per HTTP. For a more indepth analysis see https://github.com/boto/boto/issues/1669 .

Original Question

I am using django_storage's S3BotoStorage in conjunction with a FileField to upload files to Amazon S3. Here's my field:

downloadable_file = FileField(max_length=255, upload_to="widgets/filedownloads", verbose_name="file") 

In settings:

DEFAULT_FILE_STORAGE = 'storages.backends.s3boto.S3BotoStorage' 

Everything works as far as the uploading/downloading goes.

However, the files are getting stored in my bucket with an incorrect content-type. WhenI look at the metadata for the files in my AWS S3 console, the Content-Type of the file is showing up as "application%2Fpdf" instead of "application/pdf" which it should be.

Escaped content type

In case you say it shouldn't matter, it does matter. Google Chrome's built-in pdf reader will hang on pdf's with an invalid content-type, and a client brought this to my attention.

Here's an example of a file uploaded through django-storages/boto. If you're using chrome's built-in pdf reader I assume it hangs, like it does for me and the customer who reported this. If you're using a non-chrome browser, or the adobe plugin, or downloading the file to disk you'll probably be fine.

If I manually change the content-type metadata via the AWS console to 'application/pdf' (one of the standard choices it provides) then its fine.

I assume this is a bug with something internal with the way boto constructs the AWS policy document to upload the file, since I'm not doing anything outside of the standard usage here. However, I've stepped through boto code and can't find where it actually does the escaping.

Can someone either suggest a work around, or guide me to the offending code in boto so I can patch it and submit a pull request?

boto==2.9.5 django-storages==1.1.8

like image 458
B Robster Avatar asked Aug 15 '13 23:08

B Robster


2 Answers

Not a direct answer to your question, but maybe a useful workaround. I was having issues using django-storages with S3. I ended up trying cuddly-buddly and have been quite happy with it. The author based it on the S3 module from django-storages and has added quite a number of fixes. I browsed through the cuddly-buddly commits and there were some modifications affecting the content-type header, but I can't test with PDF uploads without setting up a new django project. However, I can verify that all my files uploaded through Django do not have mangled slashes in the content-type field in the S3 Metadata.

If for some reason you can't change over to cuddly-buddly for testing, let me know and I'll try to setup a simple Django project to upload some PDFs.

like image 179
Fiver Avatar answered Sep 30 '22 19:09

Fiver


The problem was that I was using a forked/obsolete version of django storages which did not properly convert content-type headers to strings from unicode before sending them to boto, which converts unicode strings to ascii strings (as required for HTTP headers) by using urllib's quoteplus escape mechanism. The problem was fixed by switching to the current version of django-storages.

For a more detailed analysis of the issue see: https://github.com/boto/boto/issues/1669#issuecomment-27132112

like image 27
B Robster Avatar answered Sep 30 '22 17:09

B Robster