Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js: What's a good way to automatically restart a node server that's not responding?

I've inherited a node.js/Express app that is a bit of a mess. It's regularly and fairly randomly getting stuck and not responding to any request until it is restarted.

I suspect that something within the app is blocking and either getting stuck in a loop or making a request to an external api without using proper Async techniques, and never getting a response and never timing out at witch point the server just stops responding but doesn't crash.

I would obviously like to find the culprit code and fix the problem, however in the mean time I would like to find a way to automatically restart the server when it stops responding.

To test out solutions locally (since I don't currently know the actual culprit) I have created the following Express route which simulates the exact behavior I'm getting.

app.get('/block-block-block', function (req, res){ 
  for(;;) {}
};

The question I have is give the above route being hit (which immediately stops the server from responding to anything), is there a way to detect the blockage in node internally and restart or shut down? And if not what is a good solution for checking when the server is not responding and restarting it?

Most searching I have done leads me to tools like forever and PM2. These work great if your app crashes but I don't really see any fuctionality for restarting when an app is radomley blocking.

like image 724
thinktt Avatar asked Feb 03 '16 19:02

thinktt


3 Answers

I figured out how to solve this using native Node functionality. Migg's answer was good and lead me in the right direction, but it still doesn't show how to automatically restart when the event loop is completely blocked.

The trick is to use Node's native child_process module and the fork method to start the server from another node instance and have that instance ping the server for responses, restarting it when it's stuck. This is similar to how Forever and PM2 work. It's hard to believe there's not a simple way to implement this with either of those libraries, but this is how you can do it naively.

I have commented this code heavily to point out what everything is doing. Also note that I am using ES2015's Arrow Functions. Go read about them if you're not familiar.

var fork = require('child_process').fork;
var server, heartbeat; 

function startServer () {
  console.log('Starting server');
  server = fork('server');

  //when the server goes down restart it
  server.on('close', (code) => {
    startServer();
  });

  //when server sends a heartbeat message save it
  server.on('message', (message) => {
    heartbeat = message ? message.heartbeat : null;
  });

  //ask the server for a heartbeat
  server.send({request: 'heartbeat'});

  //wait 5 seconds and check if the server responded
  setTimeout(checkHeartbeat, 5000);
}

function checkHeartbeat() {
  if(heartbeat) {
    console.log('Server is alive');

    //clear the heart beat and send request for a new one
    heartbeat = null; 
    server.send({request: 'heartbeat'});

    //set another hearbeat check
    setTimeout(checkHeartbeat, 5000);

  } else {
    console.log('Server looks stuck...killing');
    server.kill();
  }
}

startServer();

Be sure to change out server.js with whatever Node app you want to run.

Now on your server add the following to respond to the heartbeat request.

//listen and respond to heartbeat request from parent
process.on('message', (message) => {
  if(message && message.request === 'heartbeat') {
    process.send({heartbeat: 'thump'});
  }
});

Finally add a timeout to test that it works (not for production!)

//block the even loop after 30 seconds 
setTimeout(() => {
  for(;;){}
}, 30000);
like image 128
thinktt Avatar answered Nov 15 '22 08:11

thinktt


First of all you should try to find the problems in the code by reviewing it.

Memory Leaks

For the running app you should use pm2. It has a setting to restart the app based on too much memory consumption. Directly from the docs:

pm2 start big-array.js --max-memory-restart 20M

Or using an ecosystem.json:

{
    "max_memory_restart" : "20M"
}

There are also several great articles about debugging memory leaks in node.js to find online. There is even a module that reports leaks which we used in the early days. This is too big a subject to fill it in here.

Blocking Event Loop / Infinite Loops

You can instrument your app to report the responsiveness of the event loop. So if some code blocks the loop for too long you can programmatically terminate the process. You will have to look at process.nextTick.

You can introduce a measurement that for example calls process.nextTick every X seconds, and if it then takes more than some defined time, send process.exit(1) to terminate the process and let pm2 restart it.

The upside of this would be that your app runs most of the time. The downside would be that all users with open connections would get no answer when process.exit is called.

Debugging

To find memory leaks and other problems in the running code you should dive into https://www.joyent.com/developers/node/debug. There is a whole section about MDB which will help you find the problems, but it will take some time and getting used to it. All of this is too much information to not link to it here.

Best of luck with your app!

like image 41
crackmigg Avatar answered Nov 15 '22 09:11

crackmigg


I have run into this problem once or twice and the answer has always been to handroll a stand alone monitoring service which sends requests to an endpoint at regular intervals. After so many failed or timed out requests the service will then restart the server.

Its not with out drawbacks however. The most obvious being that your application has to fail or hit some threshold before being restarted. That means that it could be down in production for minutes or even hours before a restart depending on your thresholds. however the alternative is to wait for consumers of the application to start complaining which probably sucks worse since they are most likely your customers.

like image 1
Ben Glasser Avatar answered Nov 15 '22 07:11

Ben Glasser