In a hierarchical federated setup of prometheus with a Pull model for the metrics, I see "prometheus" and "prometheus_replica" labels in the metrics that's captured. The system is monitoring a StatefulSet deployment of Kubernetes.
When querying or alerting I see duplicate data included due to these labels, i.e I see a metric with these labels and also without these. Effectively causing wrong counts and alerts.
I see "prometheus" and "prometheus_replica" labels used in the queries on the prometheus that pulls metrics from federated endpoint.
I use ServiceMonitor with Prometheus operator on every kube cluster. All the metrics is federated to a single different Prometheus where this problem is seen.
Is there any documentation on how these labels get generated? Are those metrics to be treated duplicate or ignored?
To remove the operator and Prometheus, first delete any custom resources you created in each namespace. The operator will automatically shut down and remove Prometheus and Alertmanager pods, and associated ConfigMaps. After a couple of minutes you can go ahead and remove the operator itself.
Can Prometheus be made highly available? Yes, run identical Prometheus servers on two or more separate machines. Identical alerts will be deduplicated by the Alertmanager.
The Prometheus documentation describes external labels: # The labels to add to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager).
Shards use the Prometheus modulus configuration which takes the hash of the source label values in order to split scrape targets based on the number of shards. Prometheus operator will create number of shards multiplied by replicas pods. Note that scaling down shards will not reshard data onto remaining instances.
I've run into this issue as well, just adding for anyone getting to this comment the configuration options that affect this according to the documentation here https://github.com/prometheus-operator/prometheus-operator/blob/ca400fdc3edd0af0df896a338eca270e115b74d7/Documentation/api.md#prometheusspec . link to the code here https://github.com/prometheus-operator/prometheus-operator/blob/ca400fdc3edd0af0df896a338eca270e115b74d7/pkg/prometheus/promcfg.go#L95-L132
replicaExternalLabelName: Name of Prometheus external label used to denote replica name. Defaults to the value of prometheus_replica. External label will not be added when value is set to empty string ("").
prometheusExternalLabelName: Name of Prometheus external label used to denote Prometheus instance name. Defaults to the value of prometheus. External label will not be added when value is set to empty string ("").
So if you want to remove these duplicates, just set those options to empty strings in the Prometheus custom resource in your cluster.
I finally found these labels coming from the prometheus operator. It was added for an requirement that's unwritten in any documents. I see it doesn't work in 0.17 version. Its works in 0.23 version of operator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With