Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prometheus not receiving metrics from cadvisor in GKE

Heyo,

I've deployed a prometheus, grafana, kube-state-metrics, alertmanager, etc. setup using kubernetes in GKE v1.16.x. I've used https://github.com/do-community/doks-monitoring as a jumping off point for the yaml files.

I've been trying to debug a situation for a few days now and would be very grateful for some help. My prometheus nodes are not getting metrics from cadvisor.

  • All the services and pods in the deployments are running. prometheus, kube-state-metrics, node-exporter, all running - no errors.
  • The cadvisor targets in prometheus UI appear as "up".
  • Prometheus is able to collect other metrics from the cluster, but no pod/container level usage metrics.
  • I can see cadvisor metrics when I query kubectl get --raw "/api/v1/nodes/<your_node>/proxy/metrics/cadvisor", but when I look in prometheus for container_cpu_usage or container_memory_usage, there is no data.
  • My cadvisor scrape job config in prometheus
    - job_name: kubernetes-cadvisor
      honor_timestamps: true
      scrape_interval: 15s
      scrape_timeout: 10s
      metrics_path: /metrics/cadvisor
      scheme: https
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)

cribbed from the prometheus/docs/examples.

I've tried a whole bunch of different variations on paths and scrape configs, but no luck. Based on the fact that I can query the metrics using kubectl get (they exist) it seems to me the issue is prometheus communicating with the cadvisor target.

If anyone has experience getting this configured I'd sure appreciate some help debugging.

Cheers

like image 964
user1797466 Avatar asked Sep 10 '25 14:09

user1797466


1 Answers

I was able to dig up a blog that had an example configuration that worked for me. The GKE endpoint for cadvisor (and kubelet) metrics, is different than the standard ones that are found in documentation examples. Here's an excerpt from my working prometheus jobs:

    - job_name: kubernetes-cadvisor
      honor_timestamps: true
      scrape_interval: 15s
      scrape_timeout: 10s
      metrics_path: /metrics/cadvisor
      scheme: https
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc.cluster.local:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: kubernetes-kubelet
      honor_timestamps: true
      scrape_interval: 15s
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: https
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc.cluster.local:443
      - target_label: __metrics_path__
        source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        replacement: /api/v1/nodes/${1}/proxy/metrics

Edit: here's a link to the blog post -> https://medium.com/htc-research-engineering-blog/monitoring-kubernetes-clusters-with-grafana-e2a413febefd.

like image 107
user1797466 Avatar answered Sep 13 '25 16:09

user1797466