Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

monitoring OOM with Prometheus

Tags:

prometheus

I'd like to utilise Prometheus to monitor occurences of system OOM killer on Debian / Ubuntu. The particular case is that sometimes Redis is killed because of OOM and the already existing low-memory-available alert is not triggered because it happens too fast. But I'd like to make the solution as smart and universal as possible and also not to spend a lot of time on it, so let us not focus on Redis itself. The ideas I have so far:

  • use existing memory-related Prometheus metrics, only trigger them on extremely low values (ie. 1% memory left) and low "for" values (ie. 5 seconds) - this solution might work and is very simple to implement, but I believe it might be unreliable (not necessarily triggering on all OOMs and only on OOMs)
  • create a bash script periodically checking dmesg and creating metrics based on appropriate grep results - is likely to work, but the problem might be with properly recognizing dmesg messages already grepped from the new ones (do not alert on the same OOM twice); also this solution is not very elegant
  • create a custom Prometheus exporter - if written properly is likely to work as expected, but creating it might be a lot of work, which I'd like to avoid

I'd like to ask for your suggestions and opinions. Thanks!

like image 543
theo Avatar asked Mar 11 '26 15:03

theo


1 Answers

The node_vmstat_oom_kill metric from the node exporter will tell you this.

like image 75
brian-brazil Avatar answered Mar 15 '26 10:03

brian-brazil



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!