Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing images and thumbnails on s3 in django

Tags:

I'm trying to get my images thumbnailed and stored on s3 using django-storages, boto, and sorl-thumbnail. I have it working, but it's very slow, even with small images. I don't mind it being slow when I save the form and upload the images to s3, but I'd like it to display the image quickly after that.

The answer to this SO question explains that the thumbnail won't be created until first access, but that you can use get_thumbnail() to create it beforehand.

Django + S3 (boto) + Sorl Thumbnail: Suggestions for optimisation

I'm doing that, and now it seems that all entries into the thumbnail_kvstore table are created when uploading the image, rather than when it is displayed.

The problem is that the page displaying the image is still really slow. Looking at the logging panel in the debug toolbar, it looks like there is still lots of communication with s3. It seems like after the image and thumbnails are uploaded and cached, page should render quickly without communicating with s3.

What am I doing wrong? Thanks!

Update: weak hack seems to have gotten it working, but I'd love to know how to do this properly:

https://github.com/asciitaxi/sorl-thumbnail/commit/545cce3f5e719a91dd9cc21d78bb973b2211bbbf

Update: more information for @sorl

I'm working with 2 views:

ADD VIEW: In this view I submit the form to create the model with the image in it. The image is uploaded to s3. In a post_save signal, I call get_thumbnail() to generate the thumbnail before it's needed:

im = get_thumbnail(instance.image, '360x360') 

DISPLAY VIEW: In this view I display the thumbnail generated in the add view:

    {% thumbnail object.image "360x360" as im %}     <img src="{{ im.url }}" width="{{ im.width }}" height="{{ im.height }}">     {% endthumbnail %} 

Without the patch:

ADD VIEW: creates 3 entries in the kvstore table, accesses the cache 10 times (6 sets, 4 gets), logging tab of debug toolbar says "establishing HTTP connection" 12 times

DISPLAY VIEW: still just 3 entries in the kvstore table, just 1 get from cache, but debug toolbar says "establishing HTTP connection" 3 times still

With only the change on line 122:

ADD VIEW: same as above, except the logging only says "establishing HTTP connection" 2 times DISPLAY VIEW: same as above, except the logging only says "establishing HTTP connection" 1 time

Also adding the change on line 118:

ADD VIEW: same as above, but now we are down to 2 "establishing HTTP connection" messages DISPLAY VIEW: same as above, with no logging messages at all

UPDATE: It looks like storage._setup() is called twice, and storage.url() is called once. Based on the timing, I'd say each one makes connections to s3:

1304711315.4 _setup 1304711317.84 1304711317.84 _setup 1304711320.3 1304711320.39 _url 1304711323.66 

This seems to be reflected by the boto logging, which says "establishing HTTP connection" 3 times.

like image 755
asciitaxi Avatar asked May 04 '11 01:05

asciitaxi


People also ask

Can I store images in S3 bucket?

You can upload any file type—images, backups, data, movies, etc. —into an S3 bucket. The maximum size of a file that you can upload by using the Amazon S3 console is 160 GB. To upload a file larger than 160 GB, use the AWS CLI, AWS SDK, or Amazon S3 REST API.

How do I upload files to AWS S3 using Django REST framework?

Building a simple Django Rest API application Execute the commands below to set up the project. Add the code snippet below to urls.py file in the dropboxer project directory. Create serializers.py and urls.py files in the uploader app. In models.py file, we create a simple model that represents a single file.

What is thumbnail in Django?

django-thumbnail-works provides an enhanced version of the default Django's ImageField, which supports: Processing the original image before it is saved on the remote server.


2 Answers

As the author of sorl thumbnail I am really interested in solving this if it is not working as I intended. If the key value sotre is populated it will currently store: name, storage and size. I have made the assumption that the url is based on the name and thus should not cause any storage calls. Looking at django storages, https://github.com/e-loue/django-storages/blob/master/storages/backends/s3boto.py#L214 it seems like a safe assumption to make. In your patch you have patched the read method for some reason. When creating a thumbnail a ImageFile instance is fetched from cache (if not create it) then you can of course call read which will read the file, but the intended use is .url which calls url on the storage with the cached name which inturn should be a non storage access op. Could you try to isolate your problem to exacly where in your code this storage access happends?

Also make sure you have THUMBNAIL_DEBUG on and that you have the key value store properly set up.

like image 117
sorl Avatar answered Sep 28 '22 21:09

sorl


I'm not sure if you problem is the same as mine, but I found that accessing the width or height property of a normal Django ImageField would read the file from the storage backend, load it into PIL, and return the dimensions from there. This is especially costly with a remote backend like we're using, and we have very media-heavy pages.

https://code.djangoproject.com/ticket/8307 was opened to address this but the Django devs closed as wontfix because they want the width and height properties to always return the true values. So I just monkeypatch _get_image_dimensions() to use those fields, which does prevent a large number of the boto messages and improves my page-load times.

Below is my code modified from the patch attached to that ticket. I stuck this in a place which gets executed early, such as a models.py.

from django.core.files.images import ImageFile, get_image_dimensions def _get_image_dimensions(self):     from numbers import Number     if not hasattr(self, '_dimensions_cache'):         close = self.closed         if self.field.width_field and self.field.height_field:             width = getattr(self.instance, self.field.width_field)             height = getattr(self.instance, self.field.height_field)             #check if the fields have proper values             if isinstance(width, Number) and isinstance(height, Number):                 self._dimensions_cache = (width, height)             else:                 self.open()                 self._dimensions_cache = get_image_dimensions(self, close=close)         else:             self.open()             self._dimensions_cache = get_image_dimensions(self, close=close)      return self._dimensions_cache ImageFile._get_image_dimensions = _get_image_dimensions 
like image 21
shadfc Avatar answered Sep 28 '22 19:09

shadfc