My normal method of testing the notification and escalation chain is to simulate a failure by causing one, for example blocking a port.
But this is thoroughly unsatisfying. I don't want down time recorded in nagios where there was none. I also don't want to wait.
Does anyone know a way to test a notification chain without causing the outage? For example something like this:
$ ./check_notifications_chain <service|host> <time down>
at <x> minutes notification email sent to group <people>
at <2x> minutes notification email sent to group <people>
at <3x> minutes escalated to group <management>
at <200x> rm -rf; shutdown -h now executed.
Extending this paradigm I might make the notification chain a nagios check in itself, but I'll stop here before my brain explodes.
Anyone?
If you only want to verify that the email alerts are working properly, you could create a simple test service, which generates a warning once a day.
test_alert.sh:
#!/bin/bash
date=`date -u +%H%M`
echo $date
echo "Nagios test script. Intentionally generates a warning daily."
if [[ "$date" -ge "1900" && "$date" -le "1920" ]] ; then
exit 1
else
exit 0
fi
commands.cfg:
define command{
command_name test_alert
command_line /bin/bash /usr/local/scripts/test_alert.sh
}
services.cfg:
define service {
host localhost
service_description Test Alert
check_command test_alert
use generic-service
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With