Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Restarting an Erlang node after a segmentation fault

I'm currently running an Erlang application that is running C code through Nifs. However, if a segmentation fault occurs within the C code, the entire node goes down, as well as the Erlang virtual machine that the Erlang application was running on.

What is the best way to monitor the Erlang application and restart it if the virtual machine dies?

like image 703
Lee Torres Avatar asked Nov 11 '13 14:11

Lee Torres


2 Answers

You want to have a look at Heart.

In addition if you have NIF calls that are considered dangerous it is recommended to isolate them together with Erlang code close to them on a separate node. There are several ways of monitoring and restarting a node (e.g. Slave).

Generally however I would advise against the usage of problematic NIFs, depending on for what you are using them there are more stable alternatives.

Reason for NIF -> replacement

Sequential speed -> better optimized Erlang code. Often the high sequential speed of NIFs come at the price of them messing with Erlangs schedulers which often results in actual worse performance.

Interfacing with external libs/apps -> Erlangs ports are much better at failure isolation

like image 56
Peer Stritzinger Avatar answered Sep 19 '22 03:09

Peer Stritzinger


I've used something called supervisord. Some advantages over heart:

  1. It's not erlang specific, so if you have other stuff on the same box, you can use it to restart things
  2. Heart can have some weird behavior preventing crash dumps.
  3. If you actually want to stop the erlang process for some reason, supervisord makes this easier.
  4. If the segfault occurs at start up, heart will continue to restart erlang infinitely. Supervisord will stop trying to restart after a certain number of attempts.
like image 35
James Kingsbery Avatar answered Sep 19 '22 03:09

James Kingsbery