Why is there no easy way to get notifications if a pod becomes unhealthy and is restarted?
To me, it suggests I shouldn't care that a pod was restarted, but why not?
If a pod/container crashes for some reason Kubernetes is supposed to provide that reliability/availability that it will start somewhere else in the cluster. Having said that you probably want warnings and alerts (if you the pod goes into a Crashloopbackoff
.
Although you can write your own tool you can watch for specific events in your cluster and then you alert/warn on those using some of these tools:
Think of Pods as ephemeral entities - they can live in different nodes, they can crash, they can start again...
Kubernetes is responsible to handle the lifecycle of a pod. Your job is to tell it where to run (affinity rules) and how to tell if a pod if healthy.
There are many ways of monitoring pod crashes. For example - prometheus has a great integation with Kubernetes.
I wrote an open source tool to do this called Robusta. (Yes, it's named after the coffee.)
You can send the notifications to multiple destinations - here is a screenshot for Slack.
Under the hood we're using our own fork of Kubewatch to track APIServer events, but we're adding on multiple features like fetching logs.
You define in YAML the triggers and the actions:
- triggers:
- on_pod_update: {}
actions:
- restart_loop_reporter:
restart_reason: CrashLoopBackOff
- image_pull_backoff_reporter:
rate_limit: 3600
Each action is defined with a Python function, but you typically don't need to write them yourself because we have 50+ builtin actions. (See some examples, here.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With