Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices when trying to implement custom Kubernetes monitoring system

I have two Kubernetes clusters representing dev and staging environments.

Separately, I am also deploying a custom DevOps dashboard which will be used to monitor these two clusters. On this dashboard I will need to show information such as:

  • RAM/HD Space/CPU usage of each deployed Pod in each environment
  • Pod health (as in if it has too many container restarts etc)
  • Pod uptime

All these stats have to be at a cluster level and also per namespace, preferably. As in, if I query a for a particular namespace, I have to get all the resource usages of that namespace.

So the webservice layer of my dashboard will send a service request to the master node of my respective cluster in order to fetch this information.

Another thing I need is to implement real time notifications in my DevOps dashboard. Every time a container fails, I need to catch that event and notify relevant personnel.

I have been reading around and two things that pop up a lot are Prometheus and Metric Server. Do I need both or will one do? I set up Prometheus on a local cluster but I can't find any endpoints it exposes which could be called by my dashboard service. I'm also trying to set up Prometheus AlertManager but so far it hasn't worked as expected. Trying to fix it now. Just wanted to check if these technologies have the capabilities to meet my requirements.

Thanks!

like image 451
user538578964 Avatar asked Oct 27 '22 11:10

user538578964


2 Answers

I don't know why you are considering your own custom monitoring system. Prometheus operator provides all the functionality that you mentioned. You will end up only with your own grafana dashboard with all required information.

If you need custom notification you can set it up in Alertmanager creating correct prometheusrules.monitoring.coreos.com, you can find a lot of preconfigured prometheusrules in kubernetes-mixin . Using labels and namespaces in Alertmanager you can setup a correct route to notify person responsible for a given deployment.

Do I need both or will one do?, yes, you need both - Prometheus collects and aggregates metric when Metrick server exposes metrics from your cluster node for your Prometheus to scrape it.

If you have problems with Prometheus, Alertmanger and so on consider using helm chart as entrypoint.

like image 158
FL3SH Avatar answered Nov 11 '22 19:11

FL3SH


Prometheus + Grafana are a pretty standard setup.

Installing kube-prometheus or prometheus-operator via helm will give you Grafana, Alertmanager, node-exporter and kube-state-metrics by default and all be setup for kubernetes metrics.

Configure alertmanager to do something with the alerts. SMTP is usually the first thing setup but I would recommend some sort of event manager if this is a service people need to rely on.

Although a dashboard isn't part of your requirements, this will inform how you can connect into prometheus as a data source. There is docco on adding prometheus data source for grafana.

There are a number of prebuilt charts available to add to Grafana. There are some charts to visualise alertmanager too.

Your external service won't be querying the metrics directly with prometheus, in will be querying the collected data in prometheus stored inside your cluster. To access the API externally you will need to setup an external path to the prometheus service. This can be configured via an ingress controller in the helm deployment:

prometheus.ingress.enabled: true

You can do the same for the alertmanager API and grafana if needed.

alertmanager.ingress.enabled: true
grafana.ingress.enabled: true

You could use Grafana outside the cluster as your dashboard via the same prometheus ingress if it proves useful.

like image 32
Matt Avatar answered Nov 11 '22 19:11

Matt