I have setup CPU and Memory Requests=Limits on all containers of my pod in order to qualify it for Guaranteed Quality of Service class. Now, look at these CPU Usage and CPU Throttling graphs for same Pod for last 6 hours.
Does this look normal and expected?
CPU Usage has not even touched 50% of the set limit a single time and still it was being throttled upto 58% at times.
And a side question, what does that red line at 25% in the Throttling graph indicates?
I did some research on this topic and found that there was a bug in Linux kernel that could have caused this and that it was fixed in version 4.18 of the kernel. Reference: this and this
We are on GKE running Container Optimized OS by Google. I checked the linux kernel version on our nodes and they are on 4.19.112+ so I guess we already have that patch? What else could be the reason of this throttling pattern?
P.S. This pod (actually a deployment with autoscaling) is deployed on a separate Node pool which has none of our other workloads running on it. So the only pods other than this deployment running on Nodes in this node pool are some metrics and logging agents and exporters. Here is the full list of pods running on the same Node at which the pod discussed above is scheduled. There are indeed some pods that don't have any CPU limits set on them. Do I need to somehow set CPU limits on these as well?
Our GKE version is 1.16.9-gke.2
Here is the manifest file containing deployment, service, and auto scaler definitions.
apiVersion: apps/v1
kind: Deployment
metadata:
name: endpoints
labels:
app: endpoints
spec:
replicas: 2
selector:
matchLabels:
run: endpoints
strategy:
rollingUpdate:
maxSurge: 2
maxUnavailable: 0
template:
metadata:
labels:
run: endpoints
spec:
terminationGracePeriodSeconds: 60
containers:
- name: endpoints
image: gcr.io/<PROJECT_ID>/endpoints:<RELEASE_VERSION_PLACEHOLDER>
livenessProbe:
httpGet:
path: /probes/live
port: 8080
initialDelaySeconds: 20
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /probes/ready
port: 8080
initialDelaySeconds: 20
timeoutSeconds: 5
ports:
- containerPort: 8080
protocol: TCP
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: "/path/to/secret/gke-endpoints-deployments-access.json"
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: DEPLOYMENT_NAME
value: "endpoints"
resources:
requests:
memory: "5Gi"
cpu: 2
limits:
memory: "5Gi"
cpu: 2
volumeMounts:
- name: endpoints-gcp-access
mountPath: /path/to/secret
readOnly: true
lifecycle:
preStop:
exec:
# SIGTERM triggers a quick exit; gracefully terminate instead
command: ["/bin/sh","-c","sleep 3; /usr/sbin/nginx -s quit; sleep 57"]
# [START proxy_container]
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.16
command: ["/cloud_sql_proxy",
"-instances=<PROJECT_ID>:<ZONE>:prod-db=tcp:3306,<PROJECT_ID>:<ZONE>:prod-db-read-replica=tcp:3307",
"-credential_file=/path/to/secret/gke-endpoints-deployments-access.json"]
# [START cloudsql_security_context]
securityContext:
runAsUser: 2 # non-root user
allowPrivilegeEscalation: false
# [END cloudsql_security_context]
resources:
requests:
memory: "50Mi"
cpu: 0.1
limits:
memory: "50Mi"
cpu: 0.1
volumeMounts:
- name: endpoints-gcp-access
mountPath: /path/to/secret
readOnly: true
# [END proxy_container]
# [START nginx-prometheus-exporter container]
- name: nginx-prometheus-exporter
image: nginx/nginx-prometheus-exporter:0.7.0
ports:
- containerPort: 9113
protocol: TCP
env:
- name: CONST_LABELS
value: "app=endpoints"
resources:
requests:
memory: "50Mi"
cpu: 0.1
limits:
memory: "50Mi"
cpu: 0.1
# [END nginx-prometheus-exporter container]
tolerations:
- key: "qosclass"
operator: "Equal"
value: "guaranteed"
effect: "NoSchedule"
nodeSelector:
qosclass: guaranteed
# [START volumes]
volumes:
- name: endpoints-gcp-access
secret:
secretName: endpoints-gcp-access
# [END volumes]
---
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
name: endpoints-backendconfig
spec:
timeoutSec: 60
connectionDraining:
drainingTimeoutSec: 60
---
apiVersion: v1
kind: Service
metadata:
name: endpoints
labels:
app: endpoints
annotations:
cloud.google.com/neg: '{"ingress": true}' # Creates a NEG after an Ingress is created
beta.cloud.google.com/backend-config: '{"ports": {"80":"endpoints-backendconfig"}}'
spec:
type: NodePort
selector:
run: endpoints
ports:
- name: endpoints-nginx
port: 80
protocol: TCP
targetPort: 8080
- name: endpoints-metrics
port: 81
protocol: TCP
targetPort: 9113
---
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: endpoints-autoscaler
spec:
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 40
- type: External
external:
metricName: external.googleapis.com|prometheus|nginx_http_requests_total
metricSelector:
matchLabels:
metric.labels.app: endpoints
targetAverageValue: "5"
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: endpoints
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: endpoints-nginx-monitor
namespace: monitoring
labels:
app: endpoints-nginx-monitor
chart: prometheus-operator-8.13.7
release: prom-operator
heritage: Tiller
spec:
selector:
matchLabels:
app: endpoints
namespaceSelector:
any: true
endpoints:
- port: endpoints-metrics
path: "/metrics"
And here is the dockerfile for the only custom container image used in the deployment:
# Dockerfile extending the generic PHP image with application files for a
# single application.
FROM gcr.io/google-appengine/php:latest
# The Docker image will configure the document root according to this
# environment variable.
ENV DOCUMENT_ROOT /app
RUN /bin/bash /stackdriver-files/enable_stackdriver_integration.sh
CPU throttling occurs when you configure a CPU limit on a container, which can invertedly slow your applications response-time. Even if you have more than enough resources on your underlying node, you container workload will still be throttled because it was not configured properly.
CPU limits on Kubernetes are an antipattern Many people think you need CPU limits on Kubernetes but this isn't true. In most cases, Kubernetes CPU limits do more harm than help.
Each container has a limit of 0.5 CPU and 128MiB of memory.
If a container attempts to exceed the specified limit, the system will throttle the container.
Kubernetes uses kernel throttling to implement CPU limit. If an application goes above the limit, it gets throttled (aka fewer CPU cycles). Memory requests and limits, on the other hand, are implemented differently, and it’s easier to detect.
If you configure a CPU limit in K8s it will set period and quota. If a process running in a container reaches the limit it is preempted and has to wait for the next period. It is throttled. So this is the effect, which you are experiencing.
Let’s say you have configured 2 core as CPU limit; the k8s will translate this to 200ms. That means the container can use a maximum of 200ms CPU time without getting throttled. And here starts all misunderstanding.
So far as I can comprehend, the theoretical upper bound of throttling is n * (100ms) - limit where n is the number of vCPUs and limit is how many milliseconds of CPU you are allotted in a 100ms window (calculated earlier by cpuLimit * 100ms ).
I don't know what that red line is, so I'll skip that one. Would be nice though to know what do you expected to happen with CPU throttling case.
So, about your CPU usage and throttling, there is no indication that anything goes wrong. CPU throttling happens with any modern systems when there is lots of CPU available. So, it will slow down the clock, and will start running slower (e.g. a 2.3GHz machine switches to 2.0GHz). This is the reason you can't set CPU limit based on percentage.
So, from your graphs, what I speculate to see is a CPU clock going down, and naturally a percentage going up; as expected. Nothing weird.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With