Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thousands of images, how should I organize the directory structure? (linux)

I am getting thousands of pictures uploaded by thousands of users on my Linux server, which is hosted by 1and1.com (I believe they use CentOS, but am unsure of the version). This is a language agnostic question, however, for your reference, I am using PHP.

My first thought was to just dump them all in the same directory, however, I remember a little while ago, there was a limit to how many files or directories could be dropped in a directory.

My second thought was to partition the files inside directories based on the users email address (as it is what I am using for the user name anyhow) but I don't want to run into the limit for directories in a directory....

Anyhow, for images from [email protected], I was going to do this:

/images/domain.com/user/images...

Is this smart to do, what if thousands of users have say 'gmail' perhaps I could even go deeper, like this

/images/domain.com/[first letter of user name]/user/images...

so for [email protected] it would be...

/images/domain.com/m/mike/images...

Is this a bad approach? What is everyone else doing? I don't want to run into problems with too many directories also...


Related:

  • How many files in a directory is too many?
  • Optimum web folder structure for ~250,000 images
  • How to store images in your filesystem
  • Tips for managing a large number of files?
like image 608
MichaelICE Avatar asked May 23 '09 00:05

MichaelICE


2 Answers

I would do the following:

  1. Take an MD5 hash of each image as it comes in.
  2. Write that MD5 hash in the database where you are keeping track of these things.
  3. Store them in a directory structure where you use the first couple of bytes of the MD5 hash hex string as the dir name. So if the hash is 'abcdef1234567890' you would store it as 'a/b/abcdef1234567890'.

Using a hash also lets you merge the same image uploaded multiple times.

like image 78
Joe Beda Avatar answered Oct 18 '22 23:10

Joe Beda


to expand upon Joe Beda's approach:

  • database
  • database
  • database

if you care about grouping or finding files by user, original filename, upload date, photo-taken-on date (EXIF), etc., store this metadata in a database and use the appropriate queries to pick out the appropriate files.

Use the database primary key — whether a file hash, or an autoincrementing number — to locate files among a fixed set of directories (alternatively, use a fixed maximum-number-of-files N per directory, and when you fill up go to the next one, e.g. the kth photo should be stored at {somepath}/aaaaaa/bbbb.jpg where aaaaaa = floor(k/N), formatted as decimal or hex, and bbbb = mod(k,N), formatted as decimal or hex. If that's too flat a hierarchy for you, use something like {somepath}/aa/bb/cc/dd/ee.jpg)

Don't expose the directory structure directly to your users. If they are using web browsers to access your server via HTTP, give them a url like www.myserver.com/images/{primary key} and encode the proper filetype in the Content-Type header.

like image 20
Jason S Avatar answered Oct 18 '22 21:10

Jason S