I'm trying to test a behaviour which is hard to reproduce in a controlled environment.
Use case: Linux system; usually Redhat EL 5 or 6 (we're just starting with RHEL 7 and systemd, so it's currently out of scope).
There're situations where I need to restart a service. The script we use for stopping the service usually works quite well; it sends a SIGTERM to the process, which is designed to handle it; if the process doesn't handle the SIGTERM within a timeout (usually a couple of minutes) the script sends a SIGKILL, then waits a couple minutes more.
The problem is: in some (rare) situations, the process doesn't exit after a SIGKILL; this usually happens when it's badly stuck on a system call, possibly because of a kernel-level issue (corrupt filesystem, or not-working NFS filesystem, or something equally bad requiring manual intervention).
A bug arose when the script didn't realize that the "old" process hadn't actually exited and started a new process while the old was still running; we're fixing this with a stronger locking system (so that at least the new process doesn't start if the old is running), but I find it difficult to test the whole thing because I haven't found a way to simulate an hard-stuck process.
So, the question is:
How can I manually simulate a process that doesn't exit when sending a SIGKILL to it, even as a privileged user?
If your process are stuck doing I/O, You can simulate your situation in this way:
lvcreate -n lvtest -L 2G vgtest
mkfs.ext3 -m0 /dev/vgtest/lvtest
mount /dev/vgtest/lvtest /mnt
dmsetup suspend /dev/vgtest/lvtest && dd if=/dev/zero of=/mnt/file.img bs=1M count=2048 &
In this way the dd process will stuck waiting for IO and will ignore every signal, I know the signals aren't ignore in the latest kernel when processes are waiting for IO on nfs filesystem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With