Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Meteor: uploading file from client to Mongo collection vs file system vs GridFS

Meteor is great but it lacks native supports for traditional file uploading. There are several options to handle file uploading:

From the client, data can be sent using:

  • Meteor.call('saveFile',data) or collection.insert({file:data})
  • 'POST' form or HTTP.call('POST')

In the server, the file can be saved to:

  • a mongodb file collection by collection.insert({file:data})
  • file system in /path/to/dir
  • mongodb GridFS

What are the pros and cons for these methods and how best to implement them? I am aware that there are also other options such as saving to a third party site and obtain an url.

like image 1000
Green Avatar asked Jan 14 '15 00:01

Green


People also ask

When to use GridFS in MongoDB?

When to Use GridFS. In MongoDB, use GridFS for storing files larger than 16 MB. In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem. If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.

Which are the two collections of GridFS stores files?

GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

What is GridFS and usage?

GridFS is the MongoDB specification for storing and retrieving large files such as images, audio files, video files, etc. It is kind of a file system to store files but its data is stored within MongoDB collections. GridFS has the capability to store files even greater than its document size limit of 16MB.

Can we store files in MongoDB?

Large objects, or "files", are easily stored in MongoDB. It is no problem to store 100MB videos in the database. This has a number of advantages over files stored in a file system. Unlike a file system, the database will have no problem dealing with millions of objects.


1 Answers

You can achieve file uploading with Meteor without using any more packages or a third party

Option 1: DDP, saving file to a mongo collection

/*** client.js ***/  // asign a change event into input tag 'change input' : function(event,template){      var file = event.target.files[0]; //assuming 1 file only     if (!file) return;      var reader = new FileReader(); //create a reader according to HTML5 File API      reader.onload = function(event){                 var buffer = new Uint8Array(reader.result) // convert to binary       Meteor.call('saveFile', buffer);     }      reader.readAsArrayBuffer(file); //read the file as arraybuffer }  /*** server.js ***/   Files = new Mongo.Collection('files');  Meteor.methods({     'saveFile': function(buffer){         Files.insert({data:buffer})              }    }); 

Explanation

First, the file is grabbed from the input using HTML5 File API. A reader is created using new FileReader. The file is read as readAsArrayBuffer. This arraybuffer, if you console.log, returns {} and DDP can't send this over the wire, so it has to be converted to Uint8Array.

When you put this in Meteor.call, Meteor automatically runs EJSON.stringify(Uint8Array) and sends it with DDP. You can check the data in chrome console websocket traffic, you will see a string resembling base64

On the server side, Meteor call EJSON.parse() and converts it back to buffer

Pros

  1. Simple, no hacky way, no extra packages
  2. Stick to the Data on the Wire principle

Cons

  1. More bandwidth: the resulting base64 string is ~ 33% larger than the original file
  2. File size limit: can't send big files (limit ~ 16 MB?)
  3. No caching
  4. No gzip or compression yet
  5. Take up lots of memory if you publish files

Option 2: XHR, post from client to file system

/*** client.js ***/  // asign a change event into input tag 'change input' : function(event,template){      var file = event.target.files[0];      if (!file) return;            var xhr = new XMLHttpRequest();      xhr.open('POST', '/uploadSomeWhere', true);     xhr.onload = function(event){...}      xhr.send(file);  }  /*** server.js ***/   var fs = Npm.require('fs');  //using interal webapp or iron:router WebApp.connectHandlers.use('/uploadSomeWhere',function(req,res){     //var start = Date.now()             var file = fs.createWriteStream('/path/to/dir/filename');       file.on('error',function(error){...});     file.on('finish',function(){         res.writeHead(...)          res.end(); //end the respone          //console.log('Finish uploading, time taken: ' + Date.now() - start);     });      req.pipe(file); //pipe the request to the file }); 

Explanation

The file in the client is grabbed, an XHR object is created and the file is sent via 'POST' to the server.

On the server, the data is piped into an underlying file system. You can additionally determine the filename, perform sanitisation or check if it exists already etc before saving.

Pros

  1. Taking advantage of XHR 2 so you can send arraybuffer, no new FileReader() is needed as compared to option 1
  2. Arraybuffer is less bulky compared to base64 string
  3. No size limit, I sent a file ~ 200 MB in localhost with no problem
  4. File system is faster than mongodb (more of this later in benchmarking below)
  5. Cachable and gzip

Cons

  1. XHR 2 is not available in older browsers, e.g. below IE10, but of course you can implement a traditional post <form> I only used xhr = new XMLHttpRequest(), rather than HTTP.call('POST') because the current HTTP.call in Meteor is not yet able to send arraybuffer (point me if I am wrong).
  2. /path/to/dir/ has to be outside meteor, otherwise writing a file in /public triggers a reload

Option 3: XHR, save to GridFS

/*** client.js ***/  //same as option 2   /*** version A: server.js ***/    var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db; var GridStore = MongoInternals.NpmModule.GridStore;  WebApp.connectHandlers.use('/uploadSomeWhere',function(req,res){     //var start = Date.now()             var file = new GridStore(db,'filename','w');      file.open(function(error,gs){         file.stream(true); //true will close the file automatically once piping finishes          file.on('error',function(e){...});         file.on('end',function(){             res.end(); //send end respone             //console.log('Finish uploading, time taken: ' + Date.now() - start);         });          req.pipe(file);     });      });  /*** version B: server.js ***/    var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db; var GridStore = Npm.require('mongodb').GridStore; //also need to add Npm.depends({mongodb:'2.0.13'}) in package.js  WebApp.connectHandlers.use('/uploadSomeWhere',function(req,res){     //var start = Date.now()             var file = new GridStore(db,'filename','w').stream(true); //start the stream       file.on('error',function(e){...});     file.on('end',function(){         res.end(); //send end respone         //console.log('Finish uploading, time taken: ' + Date.now() - start);     });     req.pipe(file); });      

Explanation

The client script is the same as in option 2.

According to Meteor 1.0.x mongo_driver.js last line, a global object called MongoInternals is exposed, you can call defaultRemoteCollectionDriver() to return the current database db object which is required for the GridStore. In version A, the GridStore is also exposed by the MongoInternals. The mongo used by current meteor is v1.4.x

Then inside a route, you can create a new write object by calling var file = new GridStore(...) (API). You then open the file and create a stream.

I also included a version B. In this version, the GridStore is called using a new mongodb drive via Npm.require('mongodb'), this mongo is the latest v2.0.13 as of this writing. The new API doesn't require you to open the file, you can call stream(true) directly and start piping

Pros

  1. Same as in option 2, sent using arraybuffer, less overhead compared to base64 string in option 1
  2. No need to worry about file name sanitisation
  3. Separation from file system, no need to write to temp dir, the db can be backed up, rep, shard etc
  4. No need to implement any other package
  5. Cachable and can be gzipped
  6. Store much larger sizes compared to normal mongo collection
  7. Using pipe to reduce memory overload

Cons

  1. Unstable Mongo GridFS. I included version A (mongo 1.x) and B (mongo 2.x). In version A, when piping large files > 10 MB, I got lots of error, including corrupted file, unfinished pipe. This problem is solved in version B using mongo 2.x, hopefully meteor will upgrade to mongodb 2.x soon
  2. API confusion. In version A, you need to open the file before you can stream, but in version B, you can stream without calling open. The API doc is also not very clear and the stream is not 100% syntax exchangeable with Npm.require('fs'). In fs, you call file.on('finish') but in GridFS you call file.on('end') when writing finishes/ends.
  3. GridFS doesn't provide write atomicity, so if there are multiple concurrent writes to the same file, the final result may be very different
  4. Speed. Mongo GridFS is much slower than file system.

Benchmark You can see in option 2 and option 3, I included var start = Date.now() and when writing end, I console.log out the time in ms, below is the result. Dual Core, 4 GB ram, HDD, ubuntu 14.04 based.

file size   GridFS  FS 100 KB      50      2 1 MB        400     30 10 MB       3500    100 200 MB      80000   1240 

You can see that FS is much faster than GridFS. For a file of 200 MB, it takes ~80 sec using GridFS but only ~ 1 sec in FS. I haven't tried SSD, the result may be different. However, in real life, the bandwidth may dictate how fast the file is streamed from client to server, achieving 200 MB/sec transfer speed is not typical. On the other hand, a transfer speed ~2 MB/sec (GridFS) is more the norm.

Conclusion

By no mean this is comprehensive, but you can decide which option is best for your need.

  • DDP is the simplest and sticks to the core Meteor principle but the data are more bulky, not compressible during transfer, not cachable. But this option may be good if you only need small files.
  • XHR coupled with file system is the 'traditional' way. Stable API, fast, 'streamable', compressible, cachable (ETag etc), but needs to be in a separate folder
  • XHR coupled with GridFS, you get the benefit of rep set, scalable, no touching file system dir, large files and many files if file system restricts the numbers, also cachable compressible. However, the API is unstable, you get errors in multiple writes, it's s..l..o..w..

Hopefully soon, meteor DDP can support gzip, caching etc and GridFS can be faster...

like image 181
Green Avatar answered Oct 21 '22 12:10

Green