Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating needed memory for n connection pools for mongodb running on node.js app

I am trying to profile the performance of my node.js app running mongodb currently configured to use 50 connection pools. Using Blazemeter I have been trying to do a test that sends 1000 simulated users to my endpoint. Running on a smaller amazon ec2 instance (4 CPUs and 7.5 GB of memory the performance seemed to be CPU bound). When I started moving up to a larger machine with at least 8 CPUs running in pm2 cluster mode, it seems that mongodb is running out of memory. When the the test gets up to about 300-500 simulated users the mongo process will fail:

I.E. I get an error from all db queries and I saw the following message when I try to launch the mongo shell:

2015-10-26T23:34:56.657+0000 warning: Failed to connect to 127.0.0.1:27017, reason: errno:111 Connection refused
2015-10-26T23:34:56.658+0000 Error: couldn't connect to server 127.0.0.1:27017 (127.0.0.1), connection attempt failed at src/mongo/shell/mongo.js:146
exception: connect failed

The first time this happened, I also found the following error in the mongo log:

exception in initAndListen: 10309 Unable to create/open lock file: /var/lib/mongodb/mongod.lock errno:13 Permission denied Is a mongod instance already running?, terminating

In following tests I only saw the above behavior but did not see any errors in the mongo log.

When running these tests mongo usually end up using about 80 percent of the systems memory before failing.

Here are the only mongo queries used by this endpoint:

    utility.getNextId(db, "projects", function(err, counter) {
    var pid = counter.seq;
    var newProject = {
        name: projectName,
        path: "/projects/"+user.name+"/"+projectName,
        created: utility.now(),
        modified: utility.now(),
        uid: user.uid,
        pid: pid,
        ip: ip
    }

    // Hierarchy of cloned projects
    if( parentPid )
        newProject.parent = parentPid;

    db.collection("projects").insert(newProject, function(err, inserted) {
        db.collection("users").update(
            {uid: user.uid},
            {$addToSet: { projects:pid }},
            function(err,_) {
                callback(err, newProject);
            }
        );
    });
});
};

exports.getNextId = function(db, name, callback) {
db.collection("counters").findAndModify(
    {_id:name},
    [["_id","asc"]],
    {$inc : {"seq":1}},
    {upsert:true, new:true},
    function(err, object) {
        callback(err, object);
    }
);
};

Most of this testing was done on an amazon ec2 m4.4xlarge (16 cpus and 64GB of ram).

Is a connection pool size of 50 to large for a machine with 64gb of RAM? I would think not. Is there a good way to calculate the amount of memory needed for n connection pools? Is my issue with the queries I am making?

EDIT: Here is a screenshot showing the mongostat right as mongo collapsed on the amazon ec2 m4.4xlarge with 16cpus and 64GB of ram

enter image description here

We create the mongo DB at the top with many other requires:

var mongo = require("mongodb");
var flash = require("connect-flash");
var session = require("express-session");
var auth = require("basic-auth");
var admin = require("./admin.js");

var mongoServer = new mongo.Server("localhost", 27017, {auto_recconnect:true, poolSize: 50});
var db = new mongo.Db("aqo", mongoServer, {safe:true});
var busboy = require('connect-busboy');

db.open(function(err,db) {
    if(err)
        console.warn("mongo-open err:",err);
});

EDIT: Here are my indexes for the users collection:

[
{
    "v" : 1,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "aqo.users"
},
{
    "v" : 1,
    "key" : {
        "uid" : 1
    },
    "name" : "uid_1",
    "ns" : "aqo.users"
}
]
like image 513
Mike2012 Avatar asked Oct 26 '15 23:10

Mike2012


1 Answers

Although pool size of 50 isn't large for a machine with 64GB RAM, 800 certainly is. That's because you have 16 instances of your node process running with 50 each. The default number for max connections is 80% of the available file descriptors. If you are using Linux, the default is 1024 so you already have nearly max connections opened. Furthermore, each connection has an overhead of ~10MB, so you are using around 8GB for connections alone. This is obviously not ideal.

Ideally, You are supposed to reuse those connections in your connection pool as much as possible. So, start off your load testing with setting the poolSize to the default of 5. (i.e. 16*5=80 actually). You can trust pm2 to nicely handle the load in a round-robin fashion and that poolsize of 5 for each instance should be perfectly fine and give you an optimal performance. In case, 5 isn't enough, go up a little bit until you find something suitable.

like image 193
Rahat Mahbub Avatar answered Oct 07 '22 15:10

Rahat Mahbub