Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly scrape and query metrics in Prometheus every hour

I would like Prometheus to scrape metrics every hour and display these hourly scrape events in a table in a Grafana dashboard. I have the global scrape interval set to 1h in the prometheus.yml file. From the prometheus visualizer, it seems like Prometheus scrapes around the 43 minute mark of every hour. However, it also seems like this data is only valid for about 3 minutes: Prometheus graph

My situation, then, is this: In a Grafana table, I set the min step of a query on this metric to 1h, but this causes the table to say that there are no data points. However, if I set the min step to 5 minutes, it displays the hourly scrape events with a timestamp on the 45 minute mark. My guess as to why this happens is that Prometheus starts on the dot of some hour and steps either forward or backward by the min step.

This does achieve what I would like to do, but it also has potential for incorrect behavior if Prometheus ever does something like can been seen at the beginning of the earlier graph. I also know that I can add a time shift, but it seems like it is always relative to the current time rather than an absolute time.

Is it possible to increase the amount of time that the scrape data is valid in Prometheus without having to scrape again every 3 minutes? Or maybe tell Prometheus to scrape at the 00 minute mark of every hour? Or if not, then can I add a relative time shift to the table so that it goes from the 45 minute mark instead of the 00 minute mark?

On a side note, in the above Prometheus graph, the irregular data was scraped after Prometheus was started. I had started Prometheus around 18:30 on the 22nd, but Prometheus didn't scrape until 23:30, and then it scraped at different intervals until it stabilized around 2:43 on the 23rd. Does anybody know why?

like image 669
char Avatar asked Aug 26 '19 18:08

char


People also ask

How often should Prometheus scrape?

In this case the global setting is to scrape every 15 seconds. The evaluation_interval option controls how often Prometheus will evaluate rules. Prometheus uses rules to create new time series and to generate alerts. The rule_files block specifies the location of any rules we want the Prometheus server to load.

What is Prometheus scrape interval?

scrape_interval: 20s # By default, scrape targets every 15seconds.

How does scrape work in Prometheus?

Prometheus gathers metrics from different systems by scraping data from HTTP endpoints. It uses this information to identify issues, such as when an endpoint is missing or should not exist or when a time-series pattern indicates a problem.


1 Answers

Your data disappear because of the staleness strategy implemented in Prometheus. Once a sample has been ingested, the metric is considered stale after 5 minutes. I didn't find any configuration to change that value.

Scraping every hour is not really the philosophy of Prometheus. If your really need to scrape with such a low frequency, it could be a better idea to schedule a job sending the data to a push gateway or using a prom file fed to a node exporter (if it makes sense). You can then scrape this endpoint every 1-2 minutes.

You could also roll your own exporter that memorize the last scrape and scrape anew only if the data age exceeds one hour. (That's the solution I would prefer)

Now, as a quick solution you can request the data over the last hour and average on it. That way, you'll get the last (old) scrape taken into account:

avg_over_time(old_metric[1h])

It should work or have some transient incorrect values if there is some jitters in the scheduling of the scrape.

Regarding the issues you had about late scraping, I suspect the scraping failed at those dates. Prometheus retries only at the next schedule (1h in your case).

like image 66
Michael Doubez Avatar answered Nov 15 '22 10:11

Michael Doubez