Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Image caching vs. image processing in PHP and S3

Here is the thing. Right now I have this e-commerce web site where people can send a lot of pictures for their products. All the images are stored at Amazon's S3. When we need a thumbnail or something, I check over S3 if there is one available. If not, I process one and send it to S3 and display it on the browser. Every different sized thumbnail gets stored at S3, and checking the thumbnail availability at every request is kind of money consuming. I'm afraid I'll pay a lot once the site starts to get more attention (if it gets...).

Thinking about alternatives, I was thinking on keeping only the original images at S3 and process the images on the fly at every request. I imagine that in that way I would by on CPU usage, but I hasn't made any benchmarks to see how far can I go. The thing is that I wouldn't expend money making requests and storing more images on S3 and I could cache everything on the user's browser. I know it's not that safe to do that, so that is why I'm bringing this question here.

What do you think? How do you think I could solve this?

like image 224
Eber Freitas Dias Avatar asked Dec 29 '22 21:12

Eber Freitas Dias


2 Answers

I would resize at the time of upload and store all version in S3.

For example if you have a larger image ( 1200x1200 ~200kb ) and create 3 resized version ( 300x300, 120x120, and 60x60 ) you only add about 16% or 32kb ( for my test image, YMMV ). Lets say you need to store a million images; that is roughly 30 GB more, or $4.5 extra a month. Flickr reported to have 2 billion images ( in 2007 ) that is ~$9k extra a month, not too bad if you are that big.

Another major advantage is you will be able to use Amazon's CloudFront.

like image 112
Ambirex Avatar answered Jan 05 '23 18:01

Ambirex


If you're proxying from S3 to your clients (which it sounds like you're doing), consider two optimizations:

  1. At upload time, resize the images at once and upload as a package (tar, XML, whatever)
  2. Cache these image packages on your front end nodes.

The 'image package' will reduce the number of PUT/GET/DELETE operations, which aren't free in S3. If you have 4 image sizes, you'll cut down by 4.

The cache will further reduce S3 traffic, since I figure the work flow is usually see a thumbnail -> click it for a larger image.

On top of that, you can implement a 'hot images' cache that is actively pushed to your web nodes so it's pre-cached if you're using a cluster.

Also, I don't recommend using Slicehost<->S3. The transit costs are going to kill you. You should really use EC2 to save a ton of bandwidth(Money!!).

If you aren't proxying, but handing your clients S3 URL's for the images, you'll definitely want to preprocess all of your images. Then you don't have to check for them, but just pass the URL's to your client.

Re-processing the images every time is costly. You'll find that if you can assume that all images are resized, the amount of effort on your web nodes goes down and everything will speed up. This is especially true since you aren't firing off multiple S3 requests.

like image 44
Gary Richardson Avatar answered Jan 05 '23 18:01

Gary Richardson