Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shutting down VERY busy production node

Tags:

node.js

I have a very, very busy node application on a production server. The app deals with a real-time chat (using websockets) as well as e-commerce payments. While everything is absolutely set so that when the server goes down the clients will reconnect their sockets etc., I still have a problem: whenever the server is stopped, with a SIGINT, the event loop is cut off. This means that any pending DB write (possibly for a financial transaction) is simply discarded. There are two especially crucial moments (when the credit card merchant gives the OK, but before we write the record on the db) and at the moment we are shutting it down at off-peak times to prevent any possible problems. But this is bad.

I am thinking of this as a solution:

  • I send a custom UNIX signal to the process (SIGUSR2 for example?);
  • When server.js gets the signal:
    • It stops listening to port 80
    • It waits for the event loop to dry up
    • If after 10 seconds it's still hanging, it forces the closure This means that at each reboot the server will be at the most down for 10 seconds.

Is this what people in the real world do? Any gotcha? How do I check that the event loop is empty?

like image 937
Merc Avatar asked Nov 08 '22 01:11

Merc


1 Answers

I hope this addresses your question but at least hopefully helps (and it was too long for a comment).

This is the purpose for which load balancers are most helpful, you can control how much traffic a particular server gets up to the point where if you need to shut down the server, you can tell with security that it is not being used anymore. Since you have websockets open directly with the server, it is very likely that those connections will be persisted directly to that server and cannot be proxied through the load balancer (not sure about that), but not creating new connections would eventually make that these connections eventually die off.

Alternatively, consider a poor man's version of the load balancer and setup a proxy on this server that will hit other servers. If all your state is persisted through a common database, no operations will be disrupted, and you can give enough time (whatever that is) for the event loop to finish.

As for the server usage, if you don't currently have any way to tell what is going on with the event loop, any application logs that you have in the server may help determine what your application is doing, and just good judgement will tell you how safe is it to shut it down at a particular point. (Again, the more you can reduce usage on it before that, better.)

Finally, as Archimendix suggested, using process.on() to handle graceful termination is pretty much the standard across platforms. (Makes me remember a lot of Java-based servers that will need some time to shut down.) Depending on the severity of the effects of a non-terminating app, you may wish to let the process hang on a little longer or even perform shut-down procedures, but you should consider that this is not always possible.

Finally, try to avoid dependency on any particular server altogether. Controlled shutdowns are easy to handle, but outages and hardware failures will not give you the benefit of waiting for an event loop.

like image 198
Alpha Avatar answered Nov 15 '22 12:11

Alpha