For Prometheus metrics collection, like title, I could not really find a use case which only can be done via the type Summary, seems that they all somehow can be done via the type Histogram also.
Lets take the request concurrency metrics as example, no doubt this can be perfectly done via type Summary
, but i can also achieve the same effect by using type Histogram
, as below:
rate(http_request_duration_seconds_sum[1s]) / rate(http_request_duration_seconds_count[1s])
The only difference I can see is: for a summary the percentiles are computed in the client, it is made of a count and sum counters (like in Histogram type) and resulting quantile values.
So I am a bit lost on what use cases really make the type Summary
necessary/unique, please help to inspire me.
The Summary metric is not unique, many other instrumentation systems offer similar - such as Dropwizard's Histogram type (it's a histogram internally, but exposed as a quantile). This is one reason it exists, so such types from other instrumentation systems can be mapped more cleanly.
Another reason it exists is historical. In Prometheus the Summary came before the Histogram, and the general recommendation is to use a Histogram as it's aggregatable where the Summary's quantiles are not. On the other hand histograms require you to pre-select buckets in order to be aggregatable and allow analysis over arbitrary time frames.
There is a longer comparison of the two types in the docs.
Prometheus summary metric type is useful when there is set of pre-defined percentiles, which must be exposed for some metric such as request duration
or response size
, and there is no need in calculating aggregate percentiles over multiple metrics. For example, if you need to measure 90th, 97th and 99th percentile for request duration
on a single server, then the following metrics composing Prometheus summary would be useful to export:
http_request_duration_seconds{quantile="0.99"}
http_request_duration_seconds{quantile="0.97"}
http_request_duration_seconds{quantile="0.90"}
Another common reason why users prefer Prometheus summary type over Prometheus histogram type is that summary metrics are easier to understand and to deal with.
The summary metric type has the following limitations comparing to histogram metric type:
http_request_duration_seconds{quantile="0.99"}
metric is exposed individually per each server in a cluster, then it is impossible to calculate the 99th percentile for request duration over all the servers in the cluster. Users sometimes use avg(http_request_duration_seconds{quantile="0.99"})
or max(http_request_duration_seconds{quantile="0.99"})
as a workaround, but the resulting value may be far from the actual percentile.The histogram metric type in Prometheus also has its own issues:
Too low precision for calculated percentiles when the exported histogram buckets have insufficient coverage for the measurement. For example, if http_request_duration_seconds
histogram has the following buckets: [0-0.1]
, (0.1-1.0]
, (1.0-10.0]
- and the majority of requests are executed in 0.5 seconds, then all these requests will go to the [0.1-1.0]
bucket. But it is impossible to calculate any percentile with good precision from such a data.
Too big number of exported buckets. When users stumble upon the first issue, the most common reaction is to create big number of buckets in order to have good coverage over the measurement. This may lead to high cardinality issues, since each bucket is exposed as a separate metric (aka time series).
Inability to aggregate histograms with distinct sets of buckets. For example, the http_request_duration_seconds
histogram may have distinct sets of buckets per each monitored service. Then it is impossible to calculate percentile for this histogram over multiple services.
These issues are solved in VictoriaMetrics histogram type - see this article for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With