Organizing lots of file uploads

Question

I'm running a website that handles multimedia uploads for one of its primary uses. I'm wondering what are the best practices or industry standard for organizing alot of user uploaded files on a server.

Ben Lessani · Accepted Answer

Your question is exceptionally broad, but I'll assume you are talking about storage/organisation/hierarchy of the files (rather than platform/infrastructure).

A typical approach for organisation is to upload files to a 3 level hierarchical structure based on the filename itself.

Eg. Filename = "My_Video_12.mpg"

Which would then be stored in,

/M/Y/_/My_Video_12.mpg

Or another example, "a9usfkj_0001.jpg"

/a/9/u/a9usfkj_0001.jpg

This way, you end up with a manageable structure that makes it easy to locate a file's location simply based on its name. It also ensures that directories do not grow to a huge scale and become incredibly slow to access.

Just an idea, but it might be worth being more explicit as to what your question is actually about.

ebaxt · Answer

I don't think you are going get any concrete answers unless you give more context and describe what the use-case are for the files. Like any other technology decision, the 'best practice' is always going to be a compromise between the different functional and non-functional requirements, and as such the question needs a lot more context to yield answers that you can go and act upon.

Having said that, here are some of the strategies I would consider sound options:

1) Use the conventions dictated by the consumer of the files. For instance, if the files are going to be used by a CMS/publishing solution, that system probably has some standardized solution for handling files.

2) Use a third party upload solution. There are a bunch of tools that can help guide you to a solution that solves your specific problem. Tools like Transloadit, Zencoder and Encoding all have different options for handling uploads. Having a look at those options should give you and idea of what could be considered "industry standard".

3) Look at proved solutions, and mimic the parts that fit your use-case. There are open-source solutions that handles the sort of things you are describing here. Have a look at the different plugins to for example paperclip, to learn how they organize files, or more importantly, what abstractions do they provide that lets you change your mind when the requirements change.

4) Design your own solution. Do a spike, it's one of the most efficient ways of exposing requirements you haven't thought about. Try integrating one of the tools mentioned above, and see how it goes. Software is soft, so no decision is final. Maybe the best solution is to just try something, and change it when it doesn't fit anymore.

This is probably not the concrete answer you were looking for, but like I mentioned in the beginning, design decisions are always a trade-off, "best-practice" in one context could be the worst solution in another context :)

Best off luck!

tvdias · Answer

From what I understand you want a suggestion on how to store the files. If is that what you want, I would suggest you to have 2 different storage systems for your files.

The first storage would be a place to store the physical file, like a directory on your server (w/o FTP enabled, accessible or not to browsers, ...) or go for Amazon s3 (aws.amazon.com/en/s3/), Rackspace CloudFiles (www.rackspace.com/cloud/cloud_hosting_products/files/) or any other storage solution (you can even choose dropbox, if you want). All of these options offers APIs to save/retrieve the files.

The second storage would be a database, to index and control the files. On the DB, that could be MySQL, MSSQL or a non-relational database, like Amazon DynamoDB or SimpleSQL, you set the link to you file (http link, the path to the file or anything like this).

Also, on the DB you can control and store any metadata of the file you want and choose one or many @ebaxt's solutions to get it. The metadata can be older versions of the file, the words of a text file, the camera-model and geo-location of a picture, etc. Of course it depends on your needs and how it will be really used. You have a very large number of options, but without more info of what you intend to do is hard to suggest you a solution.

On Amazon tutorials area (http://aws.amazon.com/articles/Amazon-S3?browse=1) you can find many papers about it, like Netflix's Transition to High-Availability Storage Systems, Using the Java Persistence API with Amazon SimpleDB and Petboard: An ASP.NET Sample Using Amazon S3 and Amazon SimpleDB

Regards.

Organizing lots of file uploads

Tags:

file

upload

Trevor

3 Answers

Ben Lessani

ebaxt

tvdias

Recent Activity

Donate For Us

Organizing lots of file uploads

Tags:

file

upload

Trevor

3 Answers

Ben Lessani

ebaxt

tvdias

Related questions

Recent Activity

Donate For Us