Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use federation to collect Prometheus' metrics from multiple Prometheus instances (each using instance="localhost:9090")

Tags:

prometheus

We have multiple Prometheus instances running in data-centers (I'll refer to them as DC Prometheus instances), and one additional Prometheus instance (let's call it "main" in the following text), where we collect metrics from DC Prometheus instances by using Federation feature.

Main Prometheus is scraping {job='prometheus'} values from itself, but also from DC Prometheus instances (each scraping from localhost:9090).

Problem is that Main prometheus is complaining about out-of-order samples:

WARN[1585] Error on ingesting out-of-order samples numDropped=369 source=target.go:475 target=dc1-prometheus:443

I've found that this is because of including {job="prometheus"} in 'match[]' param.

I'm trying to solve this by label relabeling, but when I'm trying with single DC Prometheus, and constant replacement, I cannot get it to work (I'm still getting out-of-order samples error), and I don't even know what to use as replacement when using multiple targets.

  - job_name: 'federate'
    scrape_interval: 15s

    honor_labels: true
    metrics_path: '/prometheus/federate'
    scheme: 'https'

    params:
      'match[]':
        - '{job="some-jobs-here..."}'
        - '{job="prometheus"}'

    relabel_configs:
    - source_labels: ['instance']
      target_label: 'instance'
      regex: 'localhost:9090'
      replacement: '??' # I've tried with 'dc1-prometheus:9090' and single target only.. no luck

    target_groups:
      - targets:
        - 'dc1-prometheus'
        - 'dc2-prometheus'
        - 'dc3-prometheus'

My question is how to use relabel_configs to get rid of out-of-order error. I'm using Prometheus 0.17 everywhere.

like image 993
Peter Štibraný Avatar asked Apr 12 '16 14:04

Peter Štibraný


1 Answers

What you need to do here is to specify unique external_labels on each of the datacenter Prometheus servers. This will cause them to add those labels on the /federate endpoint, and prevent the clashing time series you're running into.

My blog post on federating Prometheus has an example in a case like this: http://www.robustperception.io/scaling-and-federating-prometheus/

(I should add that relabel_configs can't help you here, as that only changes target labels. metric_relabel_configs changes what comes back from the scrape. See http://www.robustperception.io/life-of-a-label/)

like image 77
brian-brazil Avatar answered Oct 09 '22 04:10

brian-brazil