Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prometheus topk returns more results than expected

Tags:

prometheus

If I use the following query

topk(5,sum(container_memory_usage_bytes{kubernetes_container_name=~".+", kubernetes_namespace=~".+"}) by (kubernetes_namespace,kubernetes_container_name))

it returns 5 results as epxected.

However with

topk(5,sum(irate(container_cpu_usage_seconds_total{kubernetes_container_name=~".+", kubernetes_namespace=~".+"}[20s])) by (kubernetes_namespace,kubernetes_container_name))

around 18 results are returned. Any idea why this happens? And what I need to change in the second query to get only the top 5?

like image 894
Jorrit Salverda Avatar asked Aug 05 '16 07:08

Jorrit Salverda


Video Answer


3 Answers

Had the same issue, I switched on "Instant" on the query and I got the correct amount back

like image 106
Patrick de Kievit Avatar answered Oct 18 '22 19:10

Patrick de Kievit


Those are the same query from the topk standpoint, both should be returning no more than 5 results.

Would I be right in saying that you're not running this as a query, but actually as a graph? If so exactly which 5 do you want chosen?

like image 20
brian-brazil Avatar answered Oct 18 '22 19:10

brian-brazil


Prometheus may return more than k time series from topk(k, ...) when building a graph in Grafana, since it independently selects top k time series with the maximum values per each point on the graph. Each point on the graph may have own set of top time series. So the final graph may contain more than k time series. There are the following solutions for this issue:

  • To set up instant query in Grafana. Then Grafana queries /api/v1/query endpoint instead of /api/v1/query_range endpoint. The /api/v1/query endpoint evaluates the query only at a single timestamp, so it consistently returns up to k time series from topk(k, ...).
  • To use one of topk_* functions from MetricsQL - PromQL-like query language from VictoriaMetrics project I work on. For example, topk_max(k, ...) returns up to k time series with the maximum values on the selected time range, while topk_last(k, ...) returns up to top k time series with the maximum values at the end of the selected time range.
like image 2
valyala Avatar answered Oct 18 '22 19:10

valyala