Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB - Storing the files in DB or on external storage?

Tags:

mongodb

We are using MongoDB. In our application we need to store multiple files per user. Our application will be deployed on AWS. I was thinking of using one of the below options to store the files -

  1. Store the files in MongoDB itself. I am aware that MongoDB has a limitation of maximum doc size of 16 MB, but most likely our files will not go beyond that.
  2. Use MongoDB's GridFS to store the files.
  3. Use Amazon S3 to store the files and the links to these files will be stroed in MongoDB.
  4. Use Local file system/EBS volumes on AWS.

Which of the above is better approach? In terms of Performance, scalability? Which approach will be more scalable? I also want to use CDN for caching the files. My preference is to AWS S3, as I can use CDN to cache the files, my storage of the files will be DB agnostic. Also My DB size will not grow significantly as I am storing the files outside the DB.

like image 987
Dattatray Avatar asked Feb 05 '15 04:02

Dattatray


People also ask

How does MongoDB store files?

In MongoDB, use GridFS for storing files larger than 16 MB. In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem. If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.

Does MongoDB store data in files?

MongoDB stores data and indexes on disk in a compressed binary format.

Where are documents stored in MongoDB?

By default Mongo stores its data in the directory /data/db . You can specify a different directory using the --dbpath option. If you're running Mongo on Windows then the directory will be C:\data\db , where C is the drive letter of the working directory in which Mongo was started.

Which file system is used to store data in MongoDB?

GridFS is the MongoDB specification for storing and retrieving large files such as images, audio files, video files, etc. It is kind of a file system to store files but its data is stored within MongoDB collections. GridFS has the capability to store files even greater than its document size limit of 16MB.


2 Answers

I think the best options here are GridFS and S3; I would go with the latter myself. Push the file to S3 and then store the bucket name and file key in your Mongo document. Unless your business or querying requirements are such that all the data must be present in the document, I think this is the best way to go.

I've used this solution in production and it scales very easily. The impact to your Mongo collection is small and you don't have to worry about storing huge amounts of data there. Just store the key and let S3 take care of all that. You can always store them somewhere else later since your system is fairly storage-agnostic.

like image 99
Nate Barbettini Avatar answered Oct 15 '22 07:10

Nate Barbettini


First I will only use local file system on develop mode, on production I will use GridFS or Amazon S3.

Lets put that words more clear.

First point (Store the files in MongoDB itself.)

  1. If you are handling images small or not big as 16MB you should store them on your collections.

Contras

Every time you make a query, you should know that you are looking into all the collection, so it will take a little bit (you can exclude the field 'images' to avoid this).

Second point (Use MongoDB's GridFS to store the files.)

  1. If you are handling images bigger than (16MB>)
  2. You can add metadata (I like this).

Take a look into GridFs Docs.

This article talk about pros and cons about using GridFs can be helpful too

also when should I use GridFS?

Third point (Use Amazon S3)

I'm not very familiar with S3 (I never use it).

But this is from Amazon docs when should I use Amazon S3?

S3 is free to join, and is a pay-as-you-go sevice, meaning you only ever pay for any of the hosting and bandwidth costs that you use, making it very attractive for start-up, agile and lean companies looking to minimize costs.

On top of this, the fully scalable, fast and reliable service provided by Amazon, makes it highly attractive to video producers and marketers all over the world.

Amazon offers S3 as a hosting system, with pricing dependent on the geographic location of the datacenter where you store your videos.

Four Point (Use Local file system)

I only use File system when I'm testing my apps, I never use that on production since from my POV, its not so scalable.

In my personal opinion I would use GridFS, but I think you have to analyze the requirements of your application, and so know which Storage Adapter use

like image 28
Ethaan Avatar answered Oct 15 '22 06:10

Ethaan