Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes has a ton of pods in error state that can't seem to be cleared

Tags:

I was originally trying to run a Job that seemed to get stuck in a CrashBackoffLoop. Here was the service file:

apiVersion: batch/v1 kind: Job metadata:   name: es-setup-indexes   namespace: elk-test spec:   template:     metadata:       name: es-setup-indexes     spec:       containers:       - name: es-setup-indexes         image: appropriate/curl         command: ['curl -H  "Content-Type: application/json" -XPUT http://elasticsearch.elk-test.svc.cluster.local:9200/_template/filebeat -d@/etc/filebeat/filebeat.template.json']         volumeMounts:         - name: configmap-volume           mountPath: /etc/filebeat/filebeat.template.json           subPath: filebeat.template.json       restartPolicy: Never        volumes:         - name: configmap-volume           configMap:             name: elasticsearch-configmap-indexes 

I tried deleting the job but it would only work if I ran the following command:

kubectl delete job es-setup-indexes --cascade=false 

After that I noticed when running:

kubectl get pods -w 

I would get a TON of pods in an Error state and I see no way to clean them up. Here is just a small sample of the output when I run get pods:

es-setup-indexes-zvx9c   0/1       Error     0         20h es-setup-indexes-zw23w   0/1       Error     0         15h es-setup-indexes-zw57h   0/1       Error     0         21h es-setup-indexes-zw6l9   0/1       Error     0         16h es-setup-indexes-zw7fc   0/1       Error     0         22h es-setup-indexes-zw9bw   0/1       Error     0         12h es-setup-indexes-zw9ck   0/1       Error     0         1d es-setup-indexes-zwf54   0/1       Error     0         18h es-setup-indexes-zwlmg   0/1       Error     0         16h es-setup-indexes-zwmsm   0/1       Error     0         21h es-setup-indexes-zwp37   0/1       Error     0         22h es-setup-indexes-zwzln   0/1       Error     0         22h es-setup-indexes-zx4g3   0/1       Error     0         11h es-setup-indexes-zx4hd   0/1       Error     0         21h es-setup-indexes-zx512   0/1       Error     0         1d es-setup-indexes-zx638   0/1       Error     0         17h es-setup-indexes-zx64c   0/1       Error     0         21h es-setup-indexes-zxczt   0/1       Error     0         15h es-setup-indexes-zxdzf   0/1       Error     0         14h es-setup-indexes-zxf56   0/1       Error     0         1d es-setup-indexes-zxf9r   0/1       Error     0         16h es-setup-indexes-zxg0m   0/1       Error     0         14h es-setup-indexes-zxg71   0/1       Error     0         1d es-setup-indexes-zxgwz   0/1       Error     0         19h es-setup-indexes-zxkpm   0/1       Error     0         23h es-setup-indexes-zxkvb   0/1       Error     0         15h es-setup-indexes-zxpgg   0/1       Error     0         20h es-setup-indexes-zxqh3   0/1       Error     0         1d es-setup-indexes-zxr7f   0/1       Error     0         22h es-setup-indexes-zxxbs   0/1       Error     0         13h es-setup-indexes-zz7xr   0/1       Error     0         12h es-setup-indexes-zzbjq   0/1       Error     0         13h es-setup-indexes-zzc0z   0/1       Error     0         16h es-setup-indexes-zzdb6   0/1       Error     0         1d es-setup-indexes-zzjh2   0/1       Error     0         21h es-setup-indexes-zzm77   0/1       Error     0         1d es-setup-indexes-zzqt5   0/1       Error     0         12h es-setup-indexes-zzr79   0/1       Error     0         16h es-setup-indexes-zzsfx   0/1       Error     0         1d es-setup-indexes-zzx1r   0/1       Error     0         21h es-setup-indexes-zzx6j   0/1       Error     0         1d kibana-kq51v   1/1       Running   0         10h 

But if I look at the jobs I get nothing related to that anymore:

$ kubectl get jobs --all-namespaces                                                                               NAMESPACE     NAME               DESIRED   SUCCESSFUL   AGE kube-system   configure-calico   1         1            46d 

I've also noticed that kubectl seems much slow to respond. I don't know if the pods are continuously trying to be restarted or in some broken state but would be great if someone could let me know how to troubleshoot as I have not come across another issue like this in kubernetes.

Kube info:

$ kubectl version  Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"} 
like image 273
xamox Avatar asked Jun 06 '17 00:06

xamox


People also ask

How do I delete all error pods in Kubernetes?

For deleting all failed pods in all namespaces you can use this command: kubectl delete pods --field-selector status. phase=Failed -A . But, for using this command, you need to configure the restartPolicy as the pod will restart again and again. And that is it!

How do you clean up Kubernetes pods?

If you create pods directly (not via a deployment), you can delete them directly, and they will stay deleted. Pods (that were created directly), deployments, and services can all be deleted independently of one another, order doesn't matter. If you want to delete them but not the namespace, delete them in any order.


1 Answers

kubectl delete pods --field-selector status.phase=Failed -n <your-namespace>

...cleans up any failed pods in your-namespace.

like image 51
Kevin Pedersen Avatar answered Oct 10 '22 12:10

Kevin Pedersen