I have a Prometheus counter, for which I want to get its rate on a time range (the real target is to sum the rate, and sometimes use histogram_quantile on that for histogram metric).
However, I've got multiple machines running that kind of job, each one sets its own instance label. This causes different inc operations on this counter in different machines to create different entities of the counter, as the combination of labels values is unique.
The problem is that rate() works separately on each such counter entity.
The result is that counter entities with unique combinations don't get into account for rate().
For example, if I've got:
mycounter{aaa="1",instance="1.2.3.4:6666",job="job1"} value: 1
mycounter{aaa="2",instance="1.2.3.4:6666",job="job1"} value: 1
mycounter{aaa="2",instance="1.2.3.4:7777",job="job1"} value: 1
mycounter{aaa="1",instance="5.5.5.5:6666",job="job1"} value: 1
All counter entities are unique, so they get values of 1.
If counter labels are always unique (come from different machines), rate(mycounter[5m]) would get values of 0 in this case,
and sum(rate(mycounter[5m])) would get 0, which is not what I need!
I want to ignore the instance label so that it would refer these mycounter inc operations as they were made on the same counter entity.
In other words, I expect to have only 2 entities (they can have a common instance value or no instance label):
mycounter{aaa="1", job="job1"} value: 2
mycounter{aaa="2", job="job1"} value: 2
In such a case, inc operation in new machine (with existing aaa value) would increase some entity counter instead of adding new entity with value of 1, and rate() would get real rates for each, so we may sum() them. How do I do that?
I made several tries to solve it but all failed:
Any suggestions?
Prometheus version: 2.3.2
Thanks in Advance!
Prometheus rate function is the process of calculating the average per second rate of value increases. You would use this when you want to view how your server CPU usage has increased over a time range or how many requests come in over a time range and how that number increases.
As you might have guessed from the name, a counter counts things. It does so in the simplest way possible, as its value can only increment but never decrement¹. Whilst it isn't possible to decrement the value of a running counter, it is possible to reset a counter. A reset happens on application restarts.
irate() irate(v range-vector) calculates the per-second instant rate of increase of the time series in the range vector. This is based on the last two data points. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for.
Use the rate() / irate() functions in Prometheus to calculate the rate of increase of a Counter. By convention, the names of Counters are suffixed by _total . To create a counter use either new/1 or declare/1 , the difference is that new/ will raise Prometheus.
You'd better expose your counters at 0 on application start, if the other labels (aaa
, etc) have a limited set of possible combinations. This way rate()
function works correctly at the bottom level and sum()
will give you correct results.
If you have to do a rate()
of the sum()
, read this first:
Note that when combining
rate()
with an aggregation operator (e.g.sum()
) or a function aggregating over time (any function ending in_over_time
), always take arate()
first, then aggregate. Otherwiserate()
cannot detect counter resets when your target restarts.
If you can tolerate this (or the instances reset counters at the same time), there's a way to work around. Define a recording rule as
record: job:mycounter:sum
expr: sum without(instance) (mycounter)
and then this expression works:
sum(rate(job:mycounter:sum[5m]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With