Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What causes pods to be slow in kubernetes?

Tags:

kubernetes

Certain pods on my cluster are extremely slow in almost all aspects. Startup time, network, i/o.

I have minimized the application code in these containers and it seems to have no effect, these are basically minimal containers running a simple webapi with a health check endpoint.

I'm wondering someone can help me figure out what's wrong or debug this.

When I say slow in all aspects I mean a couple things

  1. Very slow startup. I actually have to change my readiness probe initial delay to near 5 minutes.

  2. Inside the container running any command is slow. Running an apt-get update takes near 5 minutes, even if the container has been running for hours.

  3. Any connections to an RDS database will timeout for at least the first 10 minutes the pod is running, after that it's hit or miss, sometimes normal speed, sometimes we'll start getting connection timeouts again (mainly if the pod hasn't been used/requested for awhile).

On nearly identical pods with same base image the container will start in less than a couple seconds, and running an apt-get update will take maybe 3 seconds. I cannot for the life of me see what is different between the pods that causes some to be 'good pods', and others to be 'bad pods'.

Running any of these images locally they will start in no time (less than a second or so).

My Environment

Cluster (AWS)

  • 1 c4.large master
  • 3 c4.xlarge nodes
  • ~10-20 pods per node
  • provisioned with kops using 'standard' settings (I haven't done anything tricky)

Things I've checked/tried

  • too many pods

    My first thought was maybe i'm running too many pods. I've launched up brand new nodes for this (c4.xlarge) and had this pod the only pod running in the cluster, issue still seen.

  • node resources

    Checking every node level metric I could nothing looks out of the ordinary (also tried on several brand new pretty high powered nodes)

  • Deployment/Pod Metrics

    I'm happy to show whatever metric anyone can think of here, nothing looks out of the norm. I have Prometheus running and have looked into every metric I could think to check. I can't see difference between a 'good' running pod and a 'bad' one.

  • cluster itself

    I actually have 2 clusters, both provisioned with kops, this is seen on both clusters (though not always the same applications, which is odd).

Any help here is appreciated

like image 710
Kyle Gobel Avatar asked Dec 12 '17 16:12

Kyle Gobel


People also ask

Why is Kubernetes so slow?

Kubernetes uses the Linux kernel's CFS Bandwidth Control, which allots CPU time in microseconds to pre-defined groups. This can lead to throttling issues: A node can look slow, even when there is not a lot else happening on that CPU.

What causes CPU throttling in Kubernetes?

CPU throttling occurs when you configure a CPU limit on a container, which can invertedly slow your applications response-time. Even if you have more than enough resources on your underlying node, you container workload will still be throttled because it was not configured properly.


1 Answers

This is likely happening either due to the configuration of Resource Limits that are too constrained or by the lack of configuration Resource Requests which is allowing pods to be provisioned on nodes which do not have the necessary requirements to run their workloads.

You can resolve this by defining proper resource requests for each of your applications that are deployed to Kubernetes. In a nutshell, you can control limits and requests for shares of CPU time, bytes of memory, and Linux Hugepages.

like image 124
TJ Zimmerman Avatar answered Oct 24 '22 00:10

TJ Zimmerman