Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js+Express Randomly Drops Requests, resulting in a Gateway Timeout

Tags:

node.js

EDIT

After much putzing around, I finally found something that seems like it might be a solid lead:

The express library fails to accept the incoming request when it is currently using the Node+OAuth module to perform several outbound requests (eg, to Facebook, Twitter, etc.). I was able to determine this by putting lots of logs throughout my code, where I discovered that the "begin-request" log (described below) was not triggered when in the middle of an outbound request.

I have been able to demonstrably show that when the Node+OAuth module is making a few outbound requests, the inbound requests to my API (via a browser window) will hang and not be received until one of those outbound OAuth requests has finished.

Of course, I've already done:

require('http').globalAgent.maxSockets = 999;

Per a suggestion in IRC, I've added

console.log(require('http').globalAgent.requests);

But this always seems to be === {}, implying that there are no pending inbound requests AFAIK.

Thus I am left to conclude that either node.js or express is choosing to, for some reason, block incoming requests due to the outbound requests, even though there should be plenty of sockets available...

Anybody have any hints on how to solve this?


I have an API created in node.js using Express, Mongoose, etc. deployed on the Amazon Cloud which works wonderfully and fast 99% of the time.

Except, once in a while, a request seems to be somehow dropped or otherwise ignored. I'm talking about requests that usually complete in milliseconds randomly being unresponsive with no clear picture why.

The symptom is a simple "Gateway Timeout" when connecting to the API endpoint. An identical request, made from the same client with the same params, just moments before or moments after, will work just fine.

Of course, my first thought was "duh, server overload!" So I spent a lot of time optimizing my requests, monogoDB, etc. Finally I got to the point that the CPU/disk/RAM usage across the board (both in Node.js servers and in Mongo servers) was very low. I use Scout and RightScale to track my servers in real-time, and I log any request or query that takes more than 100ms. My node servers currently have 5GB of FREE RAM, 70% Free CPU (on the 1st core), etc. So I'm 99.99% certain it is not a performance issue.

Finally, I fell back on a desperate attempt: I attached a random number to all requests being made by my client(s). Then, in the node.js app, I do a console.log() when the request is first received and when it completes. Eg, here's the middleware I use in express:

var configureAPI = function() {
    return function(req, res, next) {
        if(req.body.ruid)
            console.log(req.body.ruid);

        // more middleware stuff...
    };
}
server.configure(function(){

    server.use(express.bodyParser());
    server.use(configureAPI());
    server.use(onError);

    // ...  more config stuff
}

What I found shocked me: apparently, the node.js app is not even receiving the requests in question. I have a Javascript webapp, and I print the "ruid" sent with the request to the console. Whenever the request succeeds, there is a corresponding "ruid" printed in the node.js console. Whenever it times out, there is not.


Edit: more debugging & info.

My app servers actually started (and continue) to also serve PHP (thus, they have Apache, etc. installed). I needed http://streamified.me to serve my website (PHP) and http://api.streamified.me to serve my API (node.js)... so I have a line in my httpd.conf file to cause requests to api.streamified.me (instead of streamified.me) to go to node.js via port 8888:

RewriteCond %{HTTP_HOST} ^api.streamified.me
RewriteRule ^(.*) http://localhost:8888$1 [P]

So, in the same httpd.conf file, I turned on RewriteLogLevel 5 and then created a simple PHP+CURL script on my localhost to hit my api.streamified.me with a random URL (which should cause node.js to trigger a simple "not found" response) until it resulted in a Gateway timeout. Here, you can see that it has happened -- and the rewrite log shows that the request was definitely received by the app server and forwarded to port 8888... but it was never received by node.js (or, at least, the first line of code in the first line of middleware never gets it...)

enter image description here


I've been over and over my node.js code and am quite certain I do not have blocking code, and even if I did, I cannot imagine it blocking the thread long enough to miss a request without it raising a red flag somewhere.

What am I missing? Is there some reason the incoming socket would be blocked? I do make a fair number of HTTP requests to external APIs via my node.js app, but AFAIK that should not be blocking incoming sockets.


Of course, I have error logging in place. I have enabled it on the process level...

process.addListener("uncaughtException", function (err) {
    // some logging code
}

and on the Express level (the onError handler above). I know that my error logging functions work, because I've seen them both fire before. But neither of them report anything around the time of the dropped requests, nor do I see anything in the console...


  • Express Version: 3.0.0rc5
  • Node.js version: 0.8.12
  • 2 instances of the node.js app running on a standard Amazon Cloud setup (m1.large instances), behind 2 Load Balancers, connecting to a 3x replica set of MongoDBs (also m1.large)
like image 752
Zane Claes Avatar asked Oct 12 '12 22:10

Zane Claes


1 Answers

It sounds like you are locking up your Node thread for too long, causing incoming connections to time out before it gets around to handling them. Node is single threaded so it only does one thing at a time, it can't choose to block an incoming request due to an outgoing request. It can only fail to accept an incoming request because it is busy doing something else. You need to figure out what it is busy doing.

If you don't make the outbound requests, everything works fine? If so, you need to look at the code making those requests to make sure you aren't waiting around for the responses.

like image 74
Bill Avatar answered Oct 07 '22 14:10

Bill