We graph a timeseries with sum(increase(foo_requests_total[1m]))
to show the number of foo requests per minute. Requests come in quite sporadically - just a couple of requests per day. The value that is shown in the graph is always 1.3333. Why is the value not 1? There was one request during this minute.
Prometheus' increase function calculates the counter increase over a specified time frame². The following PromQL expression calculates the number of job executions over the past 5 minutes.
irate should only be used when graphing volatile, fast-moving counters. Use rate for alerts and slow-moving counters, as brief changes in the rate can reset the FOR clause and graphs consisting entirely of rare spikes are hard to read.
The rate function in grafana for example:- rate(http_requests_total{job="api-server"}[5m]) returns the per-second rate of HTTP requests as measured over the last 5 minutes.
The challenge with calculating this number is that we only have a few data points inside a time range, and they tend not to be at the exact start and end of that time range (1 minute here). What do we do about the time between the start of the time range and the first data point, similarly the last data point and the end of the range?
We do a small bit of extrapolation to smooth this out and produce the correct result in aggregate. For very slow moving counters like this it can cause artifacts.
Prometheus calculates increase(foo_requests_total[1m])
at a timestamp t
in the following way:
foo_requests_total
name on the time range (t-1m ... t]
. Note that samples at the timestamp t-1m
aren't included in the selection, while samples at the timestamp t
are included in the selection.d
between the last and the first raw sample on the selected time range (Prometheus may also remove possible counter resets, but let's skip this step for the sake of clarity).d
if the first and/or the last raw sample are located too far from the bounds of the selected time range.The last step may result in fractional increase()
values over integer counters as seen in the original question. See this issue for more details. Note also that increase()
in Prometheus misses the difference between the first raw sample on the selected time range and the previous sample before the selected time range. This may result in smaller than expected increase()
results.
Prometheus developers are going to fix these issues - see this design doc. In the mean time try VictoriaMetrics - its increase() function properly returns the expected integer result without any extrapolation over integer counters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With