I have a process that I would like to kill and then restart a service. Someone has written code to kill the process by writing the following set of scripts:
ps -ef |grep "process_name" | awk '{print "kill -15 " $2}'> /projects/test/kill.sh
# Run the kill script
/projects/test/kill.sh
And then again:
ps -ef | grep "process_name" | awk '{print "kill -9 " $2}'> /projects/test/kill.sh
# Run the kill script
/projects/test/kill.sh
# Finally
service restart command here
# The problem here is that service does not
# restart properly sometimes, as it thinks
# that process is still running.
As I understand it, kill -15 gracefully kills the process. But then right away they have the kill -9 as well.
So if a process was getting killed in the first command, what happens when kill -9 is also run on the same process? Or will the ps -ef even list out that process since it has been marked for kill?
You are correct that kill -15 is to gracefully kill a process. But, killing a process is something that happens instantaneously. So the program above is going to check for pid, attempting to kill it gracefully .. If the kill -15 fails -- The kill -9 is performed. The way it knows that kill -15 failed, is the grep command. If kill -15 was successful, that pid should not exist any longer, making the following grep return empty.
So really, kill -9 only runs if kill -15 failed to gracefully stop the program. The problem with this approach, is that sometimes gracefully stopping a process can take some time depending on the program. So IMHO there needs to be a wait period or a sleep for a few seconds to allow kill -15 to attempt to gracefully stop the process .. Most assuredly with the approach above, kill -9 is almost always invoked since the script doesn't allow much time for the process to be shut down properly. In the event that kill -15 is still processing, kill -9 will just override and instantly stop the process.
If you have the option to refactor, you can use /proc/$PID as a more efficient way to detect if a process is running.
stopSvc() { local svc=$1
read x pid x < <( ps -fu "$App_user" | grep -E " ($App_baseDIR/$1/|)$svc.jar$" ||: )
local -i starting="$(date +%s)" # Linux epoch timestamp in seconds
while [[ -d "/proc/$pid" ]]
do ps -fp "$pid"
kill -term "$pid"
if (( ( $(date +%s) - starting ) < 20 )) # Been trying for less than 20 seconds
then sleep 2
date
else echo "$svc is hung - using a hard stop"
kill -KILL "$pid"
break
fi
done
sleep 2
[[ -d "/proc/$pid" ]] && return 1 || return 0 # Flip the return
}
Basically, the kill -15 is a term signal, which the process could catch to trigger a graceful shutdown, closing pipes, sockets, and files, cleaning up temporary space, etc., so to be useful, it should give some time.
The -9 is a kill and can't be caught. It's the Big Hammer that you use to squish the jobs that are misbehaving, and should be reserved for those cases.
You are totally right; this makes little sense. If you're going to use the -9 so soon, you might as well skip the careless attempt at better practice and just remove the -15.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With