Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does increase() return a value of 1.33 in prometheus?

Tags:

prometheus

We graph a timeseries with sum(increase(foo_requests_total[1m])) to show the number of foo requests per minute. Requests come in quite sporadically - just a couple of requests per day. The value that is shown in the graph is always 1.3333. Why is the value not 1? There was one request during this minute.

enter image description here

like image 384
James Avatar asked Jul 29 '16 19:07

James


People also ask

What does increase do in Prometheus?

Prometheus' increase function calculates the counter increase over a specified time frame². The following PromQL expression calculates the number of job executions over the past 5 minutes.

What is the difference between rate and irate in Prometheus?

irate should only be used when graphing volatile, fast-moving counters. Use rate for alerts and slow-moving counters, as brief changes in the rate can reset the FOR clause and graphs consisting entirely of rare spikes are hard to read.

How do you use rates in Grafana?

The rate function in grafana for example:- rate(http_requests_total{job="api-server"}[5m]) returns the per-second rate of HTTP requests as measured over the last 5 minutes.


2 Answers

The challenge with calculating this number is that we only have a few data points inside a time range, and they tend not to be at the exact start and end of that time range (1 minute here). What do we do about the time between the start of the time range and the first data point, similarly the last data point and the end of the range?

We do a small bit of extrapolation to smooth this out and produce the correct result in aggregate. For very slow moving counters like this it can cause artifacts.

like image 143
brian-brazil Avatar answered Oct 18 '22 04:10

brian-brazil


Prometheus calculates increase(foo_requests_total[1m]) at a timestamp t in the following way:

  1. It selects all the raw samples per each time series with foo_requests_total name on the time range (t-1m ... t]. Note that samples at the timestamp t-1m aren't included in the selection, while samples at the timestamp t are included in the selection.
  2. It calculates the difference d between the last and the first raw sample on the selected time range (Prometheus may also remove possible counter resets, but let's skip this step for the sake of clarity).
  3. It extrapolates the calculated difference d if the first and/or the last raw sample are located too far from the bounds of the selected time range.

The last step may result in fractional increase() values over integer counters as seen in the original question. See this issue for more details. Note also that increase() in Prometheus misses the difference between the first raw sample on the selected time range and the previous sample before the selected time range. This may result in smaller than expected increase() results.

Prometheus developers are going to fix these issues - see this design doc. In the mean time try VictoriaMetrics - its increase() function properly returns the expected integer result without any extrapolation over integer counters.

like image 3
valyala Avatar answered Oct 18 '22 04:10

valyala