Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is GridFS fast and reliable enough for production?

I develop a new website and I want to use GridFS as storage for all user uploads, because it offers a lot of advantages compared to a normal filesystem storage.

Benchmarks with GridFS served by nginx indicate, that it's not as fast as a normal filesystem served by nginx.

Benchmark with nginx

Is anyone out there, who uses GridFS already in a production environment, or would use it for a new project?

like image 298
Railsmechanic Avatar asked Aug 05 '10 08:08

Railsmechanic


1 Answers

I use gridfs at work on one of our servers which is part of a price-comparing website with honorable traffic stats (arround 25k visitors per day). The server hasn't much ram, 2gigs, and even the cpu isn't really fast (Core 2 duo 1.8Ghz) but the server has plenty storage space : 10Tb (sata) in raid 0 configuration. The job the server is doing is very simple:

Each product on our price-comparer has an image (there are around 10 million products according to our product db), and the servers job is to download the image, resize it, store it on gridfs, and deliver it to the visitors browser... if it's not present in the grid... or... deliver it to the visitors browser if it's already stored in the grid. So, this could be called as a 'traditional cdn schema'.

We have stored and processed 4 million images on this server since it's up and running. The resize and store stuff is done by a simple php script... but for sure, a python script, or something like java could be faster.

Current data size : 11.23g

Current storage size : 12.5g

Indices : 5

Index size : 849.65m

About the reliability : This is very reliable. The server doesn't load, the index size is ok, queries are fast

About the speed : For sure, is it not fast as local file storage, maybe 10% slower, but fast enough to be used in realtime even when the image needs to be processed, which is in our case, very php dependant. Maintenance and development times have also been reduced: it became so simple to delete a single or multiple images : just query the db with a simple delete command. Another interesting thing : when we rebooted our old server, with local file storage (so million of files in thousands of folders), it sometimes hangs for hours cause the system was performing a file integrity check (this really took hours...). We do not have this problem any more with gridfs, our images are now stored in big mongodb chunks (2gb files)

So... on my mind... Yes, gridfs is fast and reliable enough to be used for production.

like image 72
Manu Eidenberger Avatar answered Oct 04 '22 16:10

Manu Eidenberger