I am writing a system monitor for Linux and want to include some watchdog functionality. In the kernel, you can configure the watchdog to keep going even if /dev/watchdog is closed. In other words, if my daemon exits normally and closes /dev/watchdog, the system would still re-boot 59 seconds later. That may or may not be desirable behavior for the user.
I need to make my daemon aware of this setting because it will influence how I handle SIGINT. If the setting is on, my daemon would need to (preferably) start an orderly shutdown on exit or (at least) warn the user that the system is going to reboot shortly.
Does anyone know of a method to obtain this setting from user space? I don't see anything in sysconf() to get the value. Likewise, I need to be able to tell if the software watchdog is enabled to begin with.
Edit:
Linux provides a very simple watchdog interface. A process can open /dev/watchdog , once the device is opened, the kernel will begin a 60 second count down to reboot unless some data is written to that file, in which case the clock re-sets.
Depending on how the kernel is configured, closing that file may or may not stop the countdown. From the documentation:
The watchdog can be stopped without causing a reboot if the device /dev/watchdog is closed correctly, unless your kernel is compiled with the CONFIG_WATCHDOG_NOWAYOUT option enabled.
I need to be able to tell if CONFIG_WATCHDOG_NOWAYOUT was set from within a user space daemon, so that I can handle the shutdown of said daemon differently. In other words, if that setting is high, a simple:
# /etc/init.d/mydaemon stop
... would reboot the system in 59 seconds, because nothing is writing to /dev/watchdog any longer. So, if its set high, my handler for SIGINT needs to do additional things (i.e. warn the user at the least).
I can not find a way of obtaining this setting from user space :( Any help is appreciated.
You can check if the file is present or not. If yes, watchdog is active on your machine, else not. Now, the /usr/bin/watchdog that your were referring to is a watchdog daemon. It runs in the background and continuously reports the system status to the watchdog(either implemented as hardware/software).
Use the reset_enable member to enable or disable the system reset function. Use the dog_enable member to enable or disable the watchdog function. An error (EINVAL) is displayed if the watchdog is disabled but reset is enabled.
A watchdog on Linux is usually exported through a character device under /dev/watchdog. A simple API allows opening the device to enable the watchdog. Writing to it triggers the watchdog, and if the device is not cleanly closed, the watchdog will reboot the system.
A Watchdog Timer (WDT) is a hardware circuit that can reset the computer system in case of a software fault. You probably knew that already. Usually a userspace daemon will notify the kernel watchdog driver via the /dev/watchdog special device file that userspace is still alive, at regular intervals.
AHA! After digging through the kernel's linux/watchdog.h
and drivers/watchdog/softdog.c
, I was able to determine the capabilities of the softdog ioctl()
interface. Looking at the capabilities that it announces in struct watchdog_info
:
static struct watchdog_info ident = {
.options = WDIOF_SETTIMEOUT |
WDIOF_KEEPALIVEPING |
WDIOF_MAGICCLOSE,
.firmware_version = 0,
.identity = "Software Watchdog",
};
It does support a magic close that (seems to) override CONFIG_WATCHDOG_NOWAYOUT
. So, when terminating normally, I have to write a single char 'V' to /dev/watchdog
then close it, and the timer will stop counting.
A simple ioctl()
on a file descriptor to /dev/watchdog
asking WDIOC_GETSUPPORT
allows one to determine if this flag is set. Pseudo code:
int fd;
struct watchdog_info info;
fd = open("/dev/watchdog", O_WRONLY);
if (fd == -1) {
perror("open");
// abort, timer did not start - no additional concerns
}
if (ioctl(fd, WDIOC_GETSUPPORT, &info)) {
perror("ioctl");
// abort, but you probably started the timer! See below.
}
if (WDIOF_MAGICCLOSE & info.options) {
printf("Watchdog supports magic close char\n");
// You have started the timer here! Handle that appropriately.
}
When working with hardware watchdogs, you might want to open with O_NONBLOCK
so ioctl()
not open()
blocks (hence detecting a busy card).
If WDIOF_MAGICCLOSE
is not supported, one should just assume that the soft watchdog is configured with NOWAYOUT. Remember, just opening the device successfully starts the countdown. If all you're doing is probing to see if it supports magic close and it does, then magic close it. Otherwise, be sure to deal with the fact that you now have a running watchdog.
Unfortunately, there's no real way to know for sure without actually starting it, at least not that I could find.
a watchdog guards against hard-locking the system, either because of a software crash, or hardware failure.
what you need is a daemon monitoring daemon (dmd). check 'monit'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With