Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ideal place to store Binary data that can be rendered by calling a url

I am looking for an ideal (performance effective and maintainable) place to store binary data. In my case these are images. I have to do some image processing,scale the images and store in a suitable place which can be accesses via a RESTful service.

From my research so far I have a few options, like:

  1. NoSql solution like MongoDB,GridFS
  2. Storing as files in a file system in a directory hierarchy and then using a web server to access the images by url
  3. Apache Jackrabbit Document repository
  4. Store in a cache something like Memcache,Squid Proxy

Any thoughts of which one you would pick and why would be useful or is there a better way to do it?

like image 341
dineshr Avatar asked Dec 02 '11 14:12

dineshr


3 Answers

Just started using GridFS to do exactly what you described.

From my experience thus far, the main advantage to GridFS is that it obviates the need for a separate file storage system. Our entire persistency layer is already put into Mongo, and so the next logical step would be to store our filesystem there as well. The flat namespacing just rocks and allows you a rich query language to fetch your files based off whatever metadata you want to attach to them. In our app we used an 'appdata' object that embedded all the ownership information, ensure

Another thing to consider with NoSQL file storage, and especially GridFS, is that it will shard and expand along with your other data. If you've got your entire DB key-value store inside the mongo server, then eventually if you ever have to expand your server cluster with more machines, your filesystem will grow along with it.

It can feel a little 'black box' since the binary data itself is split into chunks, a prospect that frightens those used to a classic directory based filesystem. This is alleviated with the help of admin programs like RockMongo.

All in all to store images in GridFS is as easy as inserting the docs themselves, most of the drivers for all the major languages handle everything for you. In our environment we took image uploads at an endpoint and used PIL to perform resizing. The images were then fetched from mongo at another endpoint that just output the data and mimetyped it as a jpeg.

Best of luck!

EDIT:

To give you an example of a trivial file upload with GridFS, here's the simplest approach in PyMongo, the python library.

from pymongo import Connection
import gridfs

binary_data = 'Hello, world!'

db = Connection().test_db
fs = gridfs.GridFS(db)
#the filename kwarg sets the filename in the mongo doc, but you can pass anything in
#and make custom key-values too.
file_id = fs.put(binary_data, filename='helloworld.txt',anykey="foo")
output = fs.get(file_id).read()
print output 
>>>Hello, world!

You can also query against your custom values if you like, which can be REALLY useful if you want your queries to be based off custom information relative to your application.

try:
  file = fs.get_last_version({'anykey':'foo'})
  return file.read()
catch gridfs.errors.NoFile:
  return  None

These are just some simple examples, and the drivers for alot of the other languages (PHP, Ruby etc.) all have cognates.

like image 122
DeaconDesperado Avatar answered Sep 21 '22 21:09

DeaconDesperado


I would go for jackrabbit in combination with its REST framework sling http://sling.apache.org

Sling allows you to upload/download files via REST calls or webdav while the underlying jackrabbit repository gives you a performant storage with the possibility to store your files in a tree structure (or flat if you like).

Both jackrabbit and sling support an event mechanism where you can asynchronously process the image after upload to i.e. create thumbnails.

The manual at http://sling.apache.org/site/manipulating-content-the-slingpostservlet-servletspost.html describes how to manipulate data using the REST interface provided by sling.

like image 42
Markus Joschko Avatar answered Sep 20 '22 21:09

Markus Joschko


Storing the images as blobs in an RDBMS in another option, and you immediately get some guarantees about integrity, security etc (if this is setup properly on the database), store extra metadata, manage the collection with SQL etc.

like image 23
jaybee Avatar answered Sep 20 '22 21:09

jaybee