Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes Garbage Collection fails - FreeDiskSpaceFailed & ImageGCFailed

Apparently the GC of my Kubernetes cluster is failing to delete any image and the server is getting to full-disk.

Can you please guide me on where to find the logs for the ImageGC with the error trying to delete the images or to a reason of why this is happening?

3m         5d          1591      ip-xxx.internal     Node                                          Warning   FreeDiskSpaceFailed       {kubelet ip-xxx.internal}     failed to garbage collect required amount of images. Wanted to free 6312950988, but freed 0
3m         5d          1591      ip-xxx.internal     Node                                          Warning   ImageGCFailed             {kubelet ip-xxx.internal}     failed to garbage collect required amount of images. Wanted to free 6312950988, but freed 0

Thanks!

like image 877
felipeclopes Avatar asked Aug 01 '17 19:08

felipeclopes


People also ask

How do I check disk pressure Kubernetes?

To troubleshoot the issue of node disk pressure, you need to figure out what files are taking up the most space. Since Kubernetes is running on Linux, this is easily done by running the du command. You can either manually SSH into each Kubernetes node, or use a DaemonSet.

How does garbage collection work in Kubernetes?

Kubernetes checks for and deletes objects that no longer have owner references, like the pods left behind when you delete a ReplicaSet. When you delete an object, you can control whether Kubernetes deletes the object's dependents automatically, in a process called cascading deletion.

What is kubectl Uncordon?

Cordon will mark the node as unschedulable. Uncordon will mark the node as schedulable. The given node will be marked unschedulable to prevent new pods from arriving. Then drain deletes all pods except mirror pods (which cannot be deleted through the API server).

What is owner reference in Kubernetes?

A valid owner reference consists of the object name and a UID within the same namespace as the dependent object. Kubernetes sets the value of this field automatically for objects that are dependents of other objects like ReplicaSets, DaemonSets, Deployments, Jobs and CronJobs, and ReplicationControllers.


1 Answers

There may not be much in the way of logs (see this issue) but there may be Kubernetes event data. Look for events of type ImageGCFailed.

Alternatively you could check the cadvisor Prometheus metrics to see if it exposes any information about container garbage collecton.

Docs on the GC feature in general: https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/

like image 188
bosgood Avatar answered Sep 19 '22 15:09

bosgood