Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get time that passed since the last increase of Prometheus counter

Tags:

prometheus

Consider a Prometheus metric foo_total that counts the total amount of occurences of an event foo, i.e. the metric will only increase as long as the providing service isn't restarted.

Is there any way to get the timespan (e.g. amount of seconds) since the last increase of that metric? I know that due to the scrape period, the value for sure isn't that accurate, but an accurancy of a couple of minutes should be sufficent for me.

Background: I want to use that kind of query in Grafana to have an overview if some services are used regularly and if some jobs are done within a defined grace period. I don't have any influence on the metric itself.

like image 468
muffel Avatar asked Jan 11 '19 14:01

muffel


People also ask

How does increase work Prometheus?

increase() The increase is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if a counter increases only by integer increments. increase should only be used with counters.

Does Prometheus counter reset?

Prometheus evolves the counter approach a little more. It chooses to have its counters monotonically increasing until the client has to restart. The client never resets the counters.

What is rate Query in Prometheus?

Prometheus rate function is the process of calculating the average per second rate of value increases. You would use this when you want to view how your server CPU usage has increased over a time range or how many requests come in over a time range and how that number increases.


1 Answers

Below is the JSON for a Singlestat panel that will display the time of the last update to the up{job="prometheus"} metric. This is not exactly what you asked for: it's the last time rather than the timespan since; it's only useful as a Singlestat panel (i.e. you can't take the value and graph it since it's not a single value); and it will only display changes covered by the dashboard's time range.

The underlying query is timestamp(changes(up{job="prometheus"}[$__interval]) > 0) * 1000, so the query will basically return all timestamps where there have been any changes during the last $__interval seconds (determined dynamically by the time range and the size of the Singlestat panel in pixels). The Singlestat panel will then display the last value, if there is any. (The * 1000 is there because Grafana expects timestamps in milliseconds.)

{
  "type": "singlestat",
  "title": "Last Change",
  "gridPos": {
    "x": 0,
    "y": 0,
    "w": 12,
    "h": 9
  },
  "id": 8,
  "targets": [
    {
      "expr": "timestamp(changes(up{job=\"prometheus\"}[$__interval]) > 0) * 1000",
      "intervalFactor": 1,
      "format": "time_series",
      "refId": "A",
      "interval": "10s"
    }
  ],
  "links": [],
  "maxDataPoints": 100,
  "interval": null,
  "cacheTimeout": null,
  "format": "dateTimeAsIso",
  "prefix": "",
  "postfix": "",
  "nullText": null,
  "valueMaps": [
    {
      "value": "null",
      "op": "=",
      "text": "N/A"
    }
  ],
  "mappingTypes": [
    {
      "name": "value to text",
      "value": 1
    },
    {
      "name": "range to text",
      "value": 2
    }
  ],
  "rangeMaps": [
    {
      "from": "null",
      "to": "null",
      "text": "N/A"
    }
  ],
  "mappingType": 1,
  "nullPointMode": "connected",
  "valueName": "current",
  "prefixFontSize": "50%",
  "valueFontSize": "80%",
  "postfixFontSize": "50%",
  "thresholds": "",
  "colorBackground": false,
  "colorValue": false,
  "colors": [
    "#299c46",
    "rgba(237, 129, 40, 0.89)",
    "#d44a3a"
  ],
  "sparkline": {
    "show": false,
    "full": false,
    "lineColor": "rgb(31, 120, 193)",
    "fillColor": "rgba(31, 118, 189, 0.18)"
  },
  "gauge": {
    "show": false,
    "minValue": 0,
    "maxValue": 100,
    "thresholdMarkers": true,
    "thresholdLabels": false
  },
  "tableColumn": ""
}

If you wanted this to be more reliable, you could define a Prometheus recording rule that with a value equal to the current timestamp if there have been any changes in the last few seconds/minutes (depending on how frequently Prometheus collects the metric) or the rule's previous value otherwise. E.g. (not tested):

groups:

- name: last-update
  rules:

  - record: last-update
    expr: |
      timestamp(changes(up{job="prometheus"}[1m]) > 0)
        or
      last-update

Replacing up{job="prometheus"} with your metric selector and 1m with an interval that is at least as long as your collection interval and ideally quite a bit longer, in order to cover any collection interval jitter or missed scrapes).

Then you would use an expression like time() - last-update in Grafana to get the timespan since the last change. And you could use it in any sort of panel, without having to rely on the panel picking the last value for you.

Edit: One of the new features expected in the 2.7.0 release of Prometheus (which is due in about 2-3 weeks, if they keep to their 6 week release schedule) is subquery support. Meaning that you should be able to implement the latter, "more reliable" solution without the help of a recording rule.

If I understand this correctly, the query should look something like this:

time() - max_over_time(timestamp(changes(up{job="prometheus"}[5m]) > 0)[24h:1m])

But, just as before, this will not be a particularly efficient query, particularly over large numbers of series. You may also want to subtract 5 minutes from that and limit it using clamp_min to a non-negative value, to adjust for the 5 minute range.

like image 161
Alin Sînpălean Avatar answered Sep 28 '22 08:09

Alin Sînpălean