Difference between PromQL "by" and "without" unclear

Tags:

I have a question about calculating response times with Prometheus summary metrics.

I created a summary metric that does not only contain the service name but also the complete path and the http-method.

Now I try to calculate the average response time for the complete service. I read the article about "rate then sum" and either I do not understand how the calculation is done or the calculation is IMHO not correct.

As far as I read this should be the correct way to calculate the response time per second:

sum by(service_id) (
    rate(request_duration_sum{status_code=~"2.*"}[5m])
    /
    rate(request_duration_count{status_code=~"2.*"}[5m])
)

What I understand here is create the "duration per second" (rate sum / rate count) value for each subset and then creates the sum per service_id.

This looks absolutely wrong for me - but I think it does not work in the way I understand it.

Another way to get an equal looking result is this:

sum without (path,host) (
    rate(request_duration_sum{status_code=~"2.*"}[5m])
    /
    rate(request_duration_count{status_code=~"2.*"}[5m])
)

But what is the difference?
What is really happening here?
And why do I honestly only get measurable values if I use "max" instead of "sum"?

If I would ignore everything I read I would try it in the following way:

rate(sum by(service_id) request_duration_sum{status_code=~"2.*"}[5m])
/
rate(sum by(service_id) request_duration_count{status_code=~"2.*"}[5m])

But this will not work at all... (instant vector vs range vector and so on...).

547

asked Jun 27 '18 14:06

eventhorizon

1 Answers

All of these examples are aggregating incorrectly, as you're averaging an average. You want:

  sum without (path,host) (
    rate(request_duration_sum{status_code=~"2.*"}[5m])
  )
/
  sum without (path,host) (
    rate(request_duration_count{status_code=~"2.*"}[5m])
  )

Which will return the average latency per status_code plus any other remaining labels.

181

answered Oct 22 '22 08:10

brian-brazil

Related questions
                            
                                Prometheus dns service discovery in docker swarm
                            
                                What use cases really make prometheus's summary metrics type necessary/unique?
                            
                                How can I filter the result of label_values(label) to get a list of labels that match a regex?
                            
                                Does Prometheus allow you to scrape JSON information from an endpoint?
                            
                                How to disable go_collector metrics in prometheus/client_golang
                            
                                Micrometer's equivalent of Prometheus' labels
                            
                                Grafana histogram displays wrong values (Datasource: Prometheus)
                            
                                K8S - using Prometheus to monitor another prometheus instance in secure way
                            
                                Prometheus: how to rate a sum of the same counter from different machines?
                            
                                Prometheus returns "ranges only allowed for vector selectors"
                            
                                Prometheus: Alert on change in value
                            
                                Spring Boot 2.0 Prometheus Backward Compatibility
                            
                                How to use Prometheus' JMX exporter java agent to collect custom metrics
                            
                                Adding label to golang prometheus collector
                            
                                Prometheus and Node Exporter architecture

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between PromQL "by" and "without" unclear

Tags:

prometheus

promql

calculation

eventhorizon

People also ask

1 Answers

brian-brazil

Recent Activity

Donate For Us