I am looking for an ideal (performance effective and maintainable) place to store binary data. In my case these are images. I have to do some image processing,scale the images and store in a suitable place which can be accesses via a RESTful service.
From my research so far I have a few options, like:
Any thoughts of which one you would pick and why would be useful or is there a better way to do it?
Just started using GridFS to do exactly what you described.
From my experience thus far, the main advantage to GridFS is that it obviates the need for a separate file storage system. Our entire persistency layer is already put into Mongo, and so the next logical step would be to store our filesystem there as well. The flat namespacing just rocks and allows you a rich query language to fetch your files based off whatever metadata you want to attach to them. In our app we used an 'appdata' object that embedded all the ownership information, ensure
Another thing to consider with NoSQL file storage, and especially GridFS, is that it will shard and expand along with your other data. If you've got your entire DB key-value store inside the mongo server, then eventually if you ever have to expand your server cluster with more machines, your filesystem will grow along with it.
It can feel a little 'black box' since the binary data itself is split into chunks, a prospect that frightens those used to a classic directory based filesystem. This is alleviated with the help of admin programs like RockMongo.
All in all to store images in GridFS is as easy as inserting the docs themselves, most of the drivers for all the major languages handle everything for you. In our environment we took image uploads at an endpoint and used PIL to perform resizing. The images were then fetched from mongo at another endpoint that just output the data and mimetyped it as a jpeg.
Best of luck!
EDIT:
To give you an example of a trivial file upload with GridFS, here's the simplest approach in PyMongo, the python library.
from pymongo import Connection
import gridfs
binary_data = 'Hello, world!'
db = Connection().test_db
fs = gridfs.GridFS(db)
#the filename kwarg sets the filename in the mongo doc, but you can pass anything in
#and make custom key-values too.
file_id = fs.put(binary_data, filename='helloworld.txt',anykey="foo")
output = fs.get(file_id).read()
print output
>>>Hello, world!
You can also query against your custom values if you like, which can be REALLY useful if you want your queries to be based off custom information relative to your application.
try:
file = fs.get_last_version({'anykey':'foo'})
return file.read()
catch gridfs.errors.NoFile:
return None
These are just some simple examples, and the drivers for alot of the other languages (PHP, Ruby etc.) all have cognates.
I would go for jackrabbit in combination with its REST framework sling http://sling.apache.org
Sling allows you to upload/download files via REST calls or webdav while the underlying jackrabbit repository gives you a performant storage with the possibility to store your files in a tree structure (or flat if you like).
Both jackrabbit and sling support an event mechanism where you can asynchronously process the image after upload to i.e. create thumbnails.
The manual at http://sling.apache.org/site/manipulating-content-the-slingpostservlet-servletspost.html describes how to manipulate data using the REST interface provided by sling.
Storing the images as blobs in an RDBMS in another option, and you immediately get some guarantees about integrity, security etc (if this is setup properly on the database), store extra metadata, manage the collection with SQL etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With