Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query for a cache hit rate graph with prometheus

I'm using Caffeine cache with Spring Boot application. All metrics are enabled, so I have them on Prometheus and Grafana.

Based on cache_gets_total metric I want to build a HitRate graph.

I've tried to get a cache hits:

delta(cache_gets_total{result="hit",name="myCache"}[1m])

and all gets from cache:

sum(delta(cache_gets_total{name="myCache"}[1m]))

Both of the metrics works correctly and have values. But when I'm trying to get a hit ratio, I have no data points. Query I've tried:

delta(cache_gets_total{result="hit",name="myCache"}[1m]) / sum(delta(cache_gets_total{name="myCache"}[1m]))

Why this query doesn't work and how to get a HitRate graph based on information, I have from Spring Boot and Caffeine?

like image 787
Dmytro Patserkovskyi Avatar asked Nov 03 '25 22:11

Dmytro Patserkovskyi


1 Answers

First of all, it is recommended to use increase() instead of delta for calculating the increase of the counter over the specified lookbehind window. The increase() function properly handles counter resets to zero, which may happen on service restart, while delta() would return incorrect results if the given lookbehind window covers counter resets.

Next, Prometheus searches for pairs of time series with identical sets of labels when performing / operation. Then it applies individually the given operation per each pair of time series. Time series returned from increase(cache_gets_total{result="hit",name="myCache"}[1m]) have at least two labels: result="hit" and name="myCache", while time series returned from sum(increase(cache_gets_total{name="myCache"}[1m])) have zero labels because sum removes all the labels after the aggregation.

Prometheus provides the solution to this issue - on() and group_left() modifiers. The on() modifier allows limiting the set of labels, which should be used when searching for time series pairs with identical labelsets, while the group_left() modifier allows matching multiple time series on the left side of / with a single time series on the right side of / operator. See these docs. So the following query should return cache hit rate:

increase(cache_gets_total{result="hit",name="myCache"}[1m])
  / on() group_left()
sum(increase(cache_gets_total{name="myCache"}[1m]))

There are alternative solutions exist:

  1. To remove all the labels from increase(cache_gets_total{result="hit",name="myCache"}[1m]) with sum() function:
sum(increase(cache_gets_total{result="hit",name="myCache"}[1m]))
  /
sum(increase(cache_gets_total{name="myCache"}[1m]))
  1. To wrap the right part of the query into scalar() function. This enables vector op scalar matching rules described here:
increase(cache_gets_total{result="hit",name="myCache"}[1m])
  /
scalar(sum(increase(cache_gets_total{name="myCache"}[1m])))

It is also possible to get cache hit rate for all the caches with a single query via sum(...) by (name) template:

sum(increase(cache_gets_total{result="hit"}[1m])) by (name)
  /
sum(increase(cache_gets_total[1m])) by (name)
like image 197
valyala Avatar answered Nov 07 '25 10:11

valyala



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!