Image upload storage strategies

Tags:

When a user uploads an image to my site, the image goes through this process;

user uploads pic
store pic metadata in db, giving the image a unique id
async image processing (thumbnail creation, cropping, etc)
all images are stored in the same uploads folder

So far the site is pretty small, and there are only ~200,000 images in the uploads directory. I realise I'm nowhere near the physical limit of files within a directory, but this approach clearly won't scale, so I was wondering if anyone had any advice on upload / storage strategies for handling large volumes of image uploads.

EDIT: Creating username (or more specifically, userid) subfolders would seem to be a good solution. With a bit more digging, I've found some great info right here; How to store images in your filesystem
However, would this userid dir approach scale well if a CDN is bought into the equation?

859

asked Apr 15 '10 20:04

Mathew

2 Answers

I've answered a similar question before but I can't find it, maybe the OP deleted his question...

Anyway, Adams solution seems to be the best so far, yet it isn't bulletproof since images/c/cf/ (or any other dir/subdir pair) could still contain up to 16^30 unique hashes and at least 3 times more files if we count image extensions, a lot more than any regular file system can handle.

AFAIK, SourceForge.net also uses this system for project repositories, for instance the "fatfree" project would be placed at projects/f/fa/fatfree/, however I believe they limit project names to 8 chars.

I would store the image hash in the database along with a DATE / DATETIME / TIMESTAMP field indicating when the image was uploaded / processed and then place the image in a structure like this:

images/   2010/                                      - Year     04/                                      - Month       19/                                    - Day         231c2ee287d639adda1cdb44c189ae93.png - Image Hash

Or:

images/   2010/                                    - Year     0419/                                  - Month & Day (12 * 31 = 372)       231c2ee287d639adda1cdb44c189ae93.png - Image Hash

Besides being more descriptive, this structure is enough to host hundreds of thousands (depending on your file system limits) of images per day for several thousand years, this is the way Wordpress and others do it, and I think they got it right on this one.

Duplicated images could be easily queried on the database and you'd just have to create symlinks.

Of course, if this is not enough for you, you can always add more subdirs (hours, minutes, ...).

Personally I wouldn't use user IDs unless you don't have that info available in your database, because:

Disclosure of usernames in the URL
Usernames are volatile (you may be able to rename folders, but still...)
A user can hypothetically upload a large number of images
Serves no purpose (?)

Regarding the CDN I don't see any reason this scheme (or any other) wouldn't work...

142

answered Oct 03 '22 18:10

Alix Axel

MediaWiki generates the MD5 sum of the name of the uploaded file, and uses the first two letters of the MD5 (say, "c" and "f" of the sum "cf1e66b77918167a6b6b972c12b1c00d") to create this directory structure:

images/c/cf/Whatever_filename.png

You could also use the image ID for a predictable upper limit on the number of files per directory. Maybe take floor(image unique ID / 1000) to determine the parent directory, for 1000 images per directory.

answered Oct 03 '22 17:10

Annika Backstrom

Related questions
                            
                                php web service example [closed]
                            
                                How to have PHP display errors? (I've added ini_set and error_reporting, but just gives 500 on errors)
                            
                                pdo prepared statements with wildcards
                            
                                Symfony Flex: What does symfony.lock file do?
                            
                                Error 0x1408F10B: "SSL3_GET_RECORD:wrong version number" with PayPal SDK
                            
                                In PHPUnit, how do I mock parent methods?
                            
                                Using Domain name instead of localhost in with https in xampp
                            
                                PHP Best way to cache MySQL results?
                            
                                Get Last Executed Query in PHP PDO
                            
                                Guzzle returns cURL error 3: <url> malformed
                            
                                Do comments make the code run slower?
                            
                                Is FILTER_SANITIZE_EMAIL pointless if already using FILTER_VALIDATE_EMAIL?
                            
                                Vim PHP omni completion
                            
                                Does PHP allow named parameters so that optional arguments can be omitted from function calls?
                            
                                Does PHP optimize tail recursion?
                            
                                PHPDoc optional parameter
                            
                                PHP foreach loop through multidimensional array
                            
                                OutOfRangeException vs. OutOfBoundsException
                            
                                difference between mysqli_query and mysqli_real_query
                            
                                What's the best way to localise a date on Laravel?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Image upload storage strategies

Tags:

file

php

upload

storage

Mathew

People also ask

2 Answers

Alix Axel

Annika Backstrom

Recent Activity

Donate For Us