Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Way to lower memory usage by mongoose when doing query

I am working on node backend trying to optimize a very heavy query to mongodb via mongoose. The expected return size is considerable, but for some reason when I make the request, node begins consuming huge amounts of memory, like, 200mb+ for a single big request.

Considering the size of the return is less than 10mb in most cases, this doesn't seem right. It also refuses to let go of memory after it has finished, I know this is probably just the V8 GC doing its default behavior, but what concerns me is the huge amount of memory being consumed for a single find() request.

I've isolated it down through testing to the find() call. Once done with the call, it performs some post processing then sends the data to a callback, all in an anonymous function. I have tried using the querystream instead of the model.find(), but it shows no real improvements.

Looking around has not yielded any responses, so I will ask, is there a known way to reduce, control, or optimize the memory usage by mongoose? Does anyone know why so much excess memory is being used for a single call?

EDIT

As per Johnny and Blakes suggestions, using mixture of lean() with streaming, and using pause and resume have improved the runtime and memory usage immensely. Thank you!

like image 462
Javeed Avatar asked Jul 22 '15 23:07

Javeed


People also ask

How do I limit memory in MongoDB?

MongoDB, in its default configuration, will use will use the larger of either 256 MB or ½ of (ram – 1 GB) for its cache size. You can limit the MongoDB cache size by adding the cacheSizeGB argument to the /etc/mongod.

Why is MongoDB using so much memory?

MongoDB keeps all data in its caches for performance reason and it will not release them until its memory increase to 50% of (system RAM - 1 GB) (default). This means its memory will increase continuously when you write/read data from it. You can find additional information documented here.

Is Mongoose slower than MongoDB?

Mongoose, a neat ODM library for MongoDB used in Node. js projects, has plenty of useful features that make developers' lives easier. It manages relationships between data, has schema validation, and overall allows coding 3-5 times faster.


1 Answers

Default mongoose .find() of course returns all results as an "array", so that is always going to use memory with large results, so this leaves the "stream" interface.

The basic problem here is that you are using a stream interface ( as this inherits from the basic node stream ) each data event "fires" and the associated event handler is executed continuosly.

This means that even with a "stream" your subsequent actions in the event handler are "stacking" up, at the very least consuming lots of memory and possibly eating up the call stack if there are further asynchronous processes being fired in there.

So the best thing you can do is start to "limit" the actions in your stream processing. This is as simple as calling the .pause() method:

var stream = model.find().stream();   // however you call

stream.on("data",function() {
    // call pause on entry
    stream.pause();

    // do processing
    stream.resume();            // then resume when done
});

So .pause() stops the events in the stream being emitted and this allows the actions in your event handler to complete before continuing so that they are not all coming at once.

When your handling code is complete, you call .resume(), either directly within the block as shown here of within the callback block of any asynchronous action performed within the block. Note that the same rules apply for async actions, and that "all" must signal completion before you should call resume.

There are other optimizations that can be applied as well, and you might do well to look a "queue processing" or "async flow control" available modules to aid you in getting more performance with some parallel execution of this.

But basically think .pause() then process and .resume() to continue to avoid eating up lots of memory in your processing.

Also, be aware of your "outputs" and similarly try to use a "stream" again if building up something for a response. All this will be for nothing if the work you are doing is just actually building up another variable in memory, so it helps to be aware of that.

like image 100
Blakes Seven Avatar answered Oct 03 '22 10:10

Blakes Seven