Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java: inconsistent watchdog timeout in systemd-notify

Tags:

java

systemd

My java application gets installed onto on OpenSUSE 13.2 OS, and I'm using systemd for process control. (systemd version 210)

I would like to take advantage of the systemd watchdog functionality using systemd-notify. However, I notice the app restarting due to inconsistent timeouts from the watchdog.

With WatchdogSec=120, and the app configured to call systemd-notify every 60 seconds, I observe restarts every five to 20 minutes, on average.

here is the (slightly redacted) systemd unit file for the process:

# Cool systemd service
[Unit]
Description=Something Awesome
After=awesomeparent.service
Requires=awesomeparent.service

[Service]
Type=simple
WorkingDirectory=/opt/awesome
Environment="AWESOME_HOME=/opt/awesome" 
User=awesomeuser
Restart=always
WatchdogSec=120
NotifyAccess=all
ExecStart=/home/awesome/jre1.8.0_05/bin/java -jar awesome.jar

[Install]
WantedBy=multi-user.target

And here is the code for calling systemd-notify

String pidStr = ManagementFactory.getRuntimeMXBean().getName();
pidStr = pidStr.split("@")[0];

String cmd = "/usr/bin/systemd-notify";

Process process = new ProcessBuilder(cmd, 
                                    "MAINPID=" + pidStr, 
                                    "WATCHDOG=1").redirectErrorStream(true)
                                                 .start();

int exitCode = 0;
if ((exitCode = process.waitFor()) != 0) {                
    String output = IOUtils.toString(process.getInputStream());
    Log.MAIN_LOG.error("Failed to notify systemd: " + 
                              ((output.isEmpty()) ? "" : " " + output) +
                              " Exit code: " + exitCode);

}

In the logs, I never see the failure messages (process always returns 0 exit code) and I'm 100% sure that the task IS being executed once per minute, on the minute. I can see the task log being executed immediately before restarts.

Anyone have any ideas why systemd-notify just doesn't work sometimes?

I'm thinking about writing code to call sd_pid_notify directly, but would like to know if there's a simple config thing I can do before going that route.

like image 683
Kyle Fransham Avatar asked Nov 27 '15 20:11

Kyle Fransham


1 Answers

Here's the JNA code that solved the problem:

import com.sun.jna.Library;
import com.sun.jna.Native;

/**
 * The task issues a notification to the systemd watchdog. The systemd watchdog
 * will restart the service if the notification is not received.
 */

public class WatchdogNotifierTask implements Runnable {

private static final String SYSTEMD_SO = "systemd";
private static final String WATCHDOG_READY = "WATCHDOG=1";

@Override
public void run() {

  try {
    int returnCode = SystemD.INSTANCE.sd_notify(0, WATCHDOG_READY);
    if (returnCode < 0) {
      Log.MAIN_LOG.error(
          "Systemd watchdog returned a negative error code: " + Integer.toString(returnCode));
    } else {
      Log.MAIN_LOG.debug("Successfully updated systemd watchdog.");
    }
  } catch (Exception e) {
    Log.MAIN_LOG.error("calling sd_notify native code failed with exception: ", e);
  }
} 

/**
 * This is a linux-specific interface to load the systemd shared library and call the sd_notify
 * function. Should we need other systemd functionality, it can be loaded here. It uses JNA for
 * native library calls.
 *
 */
interface SystemD extends Library {
  SystemD INSTANCE = (SystemD) Native.loadLibrary(SYSTEMD_SO, SystemD.class);
  int sd_notify(int unset_environment, String state);
}

}
like image 88
Kyle Fransham Avatar answered Oct 02 '22 14:10

Kyle Fransham