Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the percentage by label for gauge metrics?

I'm exporting some metrics about running tasks, the available data includes the current number of tasks by their status and queue:

# TYPE gauge
tasks{queue="high", status="queued"} 2.0
tasks{queue="high", status="started"} 1.0
tasks{queue="high", status="successful"} 5.0
tasks{queue="high", status="failed"} 1.0

tasks{queue="low", status="queued"} 1.0
tasks{queue="low", status="started"} 2.0
tasks{queue="low", status="successful"} 3.0
tasks{queue="low", status="failed"} 2.0

These numbers change regularly when the tasks are added or expired from the database, so for example the failed tasks number will go up and down depending on the tasks in the database at the time of collecting the data.

I don't have a way to get the total tasks count, so that's all the data that I have, I want to calculate the percentage of the tasks by their status label and create a graph of this value using Grafana.

How the percentage should be calculated?

What I've tried so far:

Get the percentage of all successful tasks:

( sum(tasks{status="successful"}) / sum(tasks) ) * 100

Get the percentage of successful tasks by queue:

( sum(tasks{status="finished"}) by (queue) / sum(tasks) by (queue) ) * 100

How can I get this percentage by time? For example when setting the time range in Grafana? I can use the variable $__range but how should I do the calculation?

I have other data where I have count metrics and I'm doing the following:

sum(increase(tasks_total{status="success"}[$__range])) /
sum(increase(tasks_total{status="started"}[$__range]))

But these are counts and these calculation don't apply to gauge metrics.

like image 852
Pierre Avatar asked Sep 03 '25 06:09

Pierre


1 Answers

The per-status percentage for the currently running tasks can be calculated with the following PromQL query:

100 * (sum(tasks) by (status) / on() group_left() sum(tasks))

It uses on() and group_left() modifiers for / operation in order to augment the default behavior for matching time series on the left and the right side of / operation. See these docs for details.

If you need to get per-queue per-status percentage for the currently running tasks, then the following query should work:

100 * (
  sum(tasks) by (queue, status)
    / on(queue) group_left()
  sum(tasks) by (queue)
)

If you need per-status percentage for tasks, which were run during the selected time $__range, then the following query should work:

100 * (
  sum(sum_over_time(tasks[$__range])) by (status)
    / on() group_left()
  sum(sum_over_time(tasks[$__range]))
)

It uses sum_over_time function for aggregating the number of tasks over the given $__range. The $__range can be substituted with any supported time duration - see these docs.

like image 106
valyala Avatar answered Sep 05 '25 00:09

valyala



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!