Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I understand Prometheus's rate vs increase functions correctly?

Tags:

prometheus

I have read the Prometheus documentation carefully, but its still a bit unclear to me, so I am here to get confirmation about my understanding.

(Please note that for the sake of the simplest examples possible I have used the one second for scraping interval, timerange - even if its not possible in practice)

Despite we scrape a counter in each second and the counter's values is 30 right now. We have the following timeseries for that:

second   counter_value    increase calculated by hand(call it ICH from now) 1             1                    1 2             3                    2 3             6                    3 4             7                    1 5            10                    3 6            14                    4 7            17                    3 8            21                    4 9            25                    4 10           30                    5 

We want to run some query on this dataset.

1.rate()
Official document states:
"rate(v range-vector) : calculates the per-second average rate of increase of the time series in the range vector."

With a layman's terms this means that we will get the increase for every second and the value for the given second will be the average increment in the given range?

Here is what I mean:
rate(counter[1s]): will match ICH because average will be calculated from one value only.
rate(counter[2s]): will get the average from the increment in 2 sec and distribute it among the seconds
So in the first 2 second we got an increment of total 3 which means the average is 1.5/sec. final result:

second result 1       1,5 2       1,5 3        2 4        2 5       3,5 6       3,5 7       3,5 8       3,5 9       4,5 10      4,5 

rate(counter[5s]): will get the average from the increment in 5 sec and distribute it among the seconds
The same as for [2s] but we calculate the average from total increment of 5sec. final result:

second result 1        2 2        2 3        2 4        2 5        2 6        4 7        4 8        4 9        4 10       4 

So the higher the timerange the smoother result we will get. And the sum of these increase will match the actual counter.

2.increase()
Official document states:
"increase(v range-vector) : calculates the increase in the time series in the range vector."

For me this means it wont distribute the average among the seconds, but instead will show the single increment for the given range(with extrapolation).
increase(counter[1s]): In my term this will match with the ICH and the rate for 1s, just because the total range and rate's base granularity match.
increase(counter[2s]): First 2 seconds gave us an increment of 3 total,so 2.seconds will get the value of 3 and so on...

  second result        1        3*       2        3     3        4*     4        4     5        7*     6        7     7        7*     8        7     9        9*     10       9 

*In my terms these values means the extrapolated values to cover every second.

Do I understand it well or am I far from that?

like image 980
beatrice Avatar asked Feb 02 '19 15:02

beatrice


People also ask

What is increase function in Prometheus?

Prometheus' increase function calculates the counter increase over a specified time frame². The following PromQL expression calculates the number of job executions over the past 5 minutes. increase(job_execution_total[5m]) Since our job runs at a fixed interval of 30 seconds, our graph should show a value of around 10.

What does rate mean in Prometheus?

Prometheus rate function is the process of calculating the average per second rate of value increases. You would use this when you want to view how your server CPU usage has increased over a time range or how many requests come in over a time range and how that number increases.

How do you use rates in Grafana?

The rate function in grafana for example:- rate(http_requests_total{job="api-server"}[5m]) returns the per-second rate of HTTP requests as measured over the last 5 minutes.


1 Answers

In an ideal world (where your samples' timestamps are exactly on the second and your rule evaluation happens exactly on the second) rate(counter[1s]) would return exactly your ICH value and rate(counter[5s]) would return the average of that ICH and the previous 4. Except the ICH at second 1 is 0, not 1, because no one knows when your counter was zero: maybe it incremented right there, maybe it got incremented yesterday, and stayed at 1 since then. (This is the reason why you won't see an increase the first time a counter appears with a value of 1 -- because your code just created and incremented it.)

increase(counter[5s]) is exactly rate(counter[5s]) * 5 (and increase(counter[2s]) is exactly rate(counter[2s]) * 2).

Now what happens in the real world is that your samples are not collected exactly every second on the second and rule evaluation doesn't happen exactly on the second either. So if you have a bunch of samples that are (more or less) 1 second apart and you use Prometheus' rate(counter[1s]), you'll get no output. That's because what Prometheus does is it takes all the samples in the 1 second range [now() - 1s, now()] (which would be a single sample in the vast majority of cases), tries to compute a rate and fails.

If you query rate(counter[5s]) OTOH, Prometheus will pick all the samples in the range [now() - 5s, now] (5 samples, covering approximately 4 seconds on average, say [t1, v1], [t2, v2], [t3, v3], [t4, v4], [t5, v5]) and (assuming your counter doesn't reset within the interval) will return (v5 - v1) / (t5 - t1). I.e. it actually computes the rate of increase over ~4s rather than 5s.

increase(counter[5s]) will return (v5 - v1) / (t5 - t1) * 5, so the rate of increase over ~4 seconds, extrapolated to 5 seconds.

Due to the samples not being exactly spaced, both rate and increase will often return floating point values for integer counters (which makes obvious sense for rate, but not so much for increase).

like image 55
Alin Sînpălean Avatar answered Oct 16 '22 11:10

Alin Sînpălean