kubectl commands timeout without details

Question

I'm running a Kubernetes cluster, which has worked fine for several months. Now, today, when I was about to deploy some updates, I get timeouts from the server.

Running $ kubectl get nodes yields

Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)

Running $ kubectl get pods --all-namespaces yields

Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get pods)

Running $ kubectl get deployments yields

Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get deployments.extensions)

Running $ kubectl get svc yields

Error from server (Timeout): the server was unable to return a response in the time allotted, but may still be processing the request (get services)

Running $ kubectl cluster-info yields (note no output after the master)

Kubernetes master is running at https://cluster.mysite.com

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

As I get these timeouts for every command, troubleshooting is impossible.

How can I continue from here to access my servers? I'm using kube-aws, and an AWS CloudFormation VPC.

Thanks for your time.

EDIT:

As per request, I ran $ kubectl get pods -v 7 and after a bunch of cache returns got this:

I0103 16:51:32.196859 25644 round_trippers.go:414] GET cluster.mysite.com/api/v1/nodes
I0103 16:51:32.196888 25644 round_trippers.go:421] Request Headers: 
I0103 16:51:32.196894 25644 round_trippers.go:424]     Accept: application/json
I0103 16:51:32.196899 25644 round_trippers.go:424]     User-Agent: kubectl/v1.8.3 (darwin/amd64) kubernetes/f0efb3c
I0103 16:52:32.239841 25644 round_trippers.go:439]     Response Status: 504 Gateway Timeout in 60044 milliseconds

I also ran $ kubectl cluster-info dump -v 7 and got:

I0103 16:51:32.196888   25644 round_trippers.go:421] Request Headers:
I0103 16:51:32.196894   25644 round_trippers.go:424]     Accept: application/json
I0103 16:51:32.196899   25644 round_trippers.go:424]     User-Agent: kubectl/v1.8.3 (darwin/amd64) kubernetes/f0efb3c
I0103 16:52:32.239841   25644 round_trippers.go:439] Response Status: 504 Gateway Timeout in 60044 milliseconds
I0103 16:52:32.242362   25644 helpers.go:207] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "the server was unable to return a response in the time allotted, but may still be processing the request (get nodes)",
  "reason": "Timeout",
  "details": {
    "kind": "nodes",
    "causes": [
      {
        "reason": "UnexpectedServerResponse",
        "message": "{\"metadata\":{},\"status\":\"Failure\",\"message\":\"The list operation against nodes could not be completed at this time, please try again.\",\"reason\":\"ServerTimeout\",\"details\":{\"name\":\"list\",\"kind\":\"nodes\"},\"code\":500}"
      }
    ]
  },
  "code": 504
}]

EDIT 2: Okay, now I'm just getting Unable to connect to the server: EOF on every request and I'm starting to get scared. This is a production cluster and I can't even access it to try to troubleshoot. Anyone have a hint on how to proceed?

EDIT 3: I've gotten as far as realizing that the etcd cluster was not working properly, with 2/3 nodes out of sync. Restarting one node had it properly joining the cluster again, but the second one can't start the services. The services that don't start are:

etcdadm-check.service
etcdadm-save.service
etcdadm-update-status.service
user@0.service

The three former ones all give the error etcdadm-check.service: Control process exited, code=exited status=3 and the last one gives user@0.service: Start request repeated too quickly..

Any hints on how to handle this?

Also, after restoring the second etcd, I get Unable to connect to the server: x509: certificate signed by unknown authority when running any kubectl commands. Does this signify data loss? My certificates are still valid for over half a year, and I haven't changed anything about them.

EDIT 4: I still have the etcd-issue, but am following the instructions in camil's answer at this time, will update with the result. However, I solved the issue with the certificates not being valid simply by re-running $ kube-aws render credentials with the proper paths to my intermediate root CA, so that issue is solved.

Camil · Accepted Answer

To avoid the timeouts, you can pass this flag --request-timeout='1s'. This will allow further debugging.

I see you are running kube-aws,so it will be safe to terminate the master instances (at least one, if you run multiple masters). The ASG will replace them automatically. You can do this also with the ETCD nodes.

If the issue still persists, then you have to ssh into masters and check the logs and services by running commands like:

journalctl -xe
systemctl status -l kubelet.service
systemctl status -l flanneld.service
systemctl status -l docker.service
rkt list

You can also use this function to debug using kubectl from inside the masters:

kubectl() {
/usr/bin/docker run --rm --net=host \
  -v /etc/resolv.conf:/etc/resolv.conf \
  -v /srv/kube-aws/plugins:/srv/kube-aws/plugins \
  quay.io/coreos/hyperkube:v1.9.0_coreos.0 /hyperkube kubectl "$@"
}

Then try these commands:

kubectl get componentstatus
kubectl cluster-info
kubectl get pods -n kube-system
kubectl get events -n kube-system

Check the connectivity to ETCD from masters

export $(cat /etc/etcd-environment | tr -d "'")

/usr/bin/etcdctl \
--ca-file=/etc/kubernetes/ssl/etcd-trusted-ca.pem \
--cert-file=/etc/kubernetes/ssl/etcd-client.pem \
--key-file=/etc/kubernetes/ssl/etcd-client-key.pem \
--endpoints="${ETCD_ENDPOINTS}" \
cluster-health

Victor Basso · Answer

rm -r ~/.kube/cache/discovery worked for me.

My timeout messages looked different than yours though:

E0528 20:32:29.191243    1730 request.go:975] Unexpected error when reading response body: net/http: request canceled (Client.Timeout exceeded while reading body)

kubectl commands timeout without details

Tags:

amazon-web-services

kubernetes

kube-aws

Helge Talvik Söderström

2 Answers

Camil

Victor Basso

Recent Activity

Donate For Us

kubectl commands timeout without details

Tags:

amazon-web-services

kubernetes

kube-aws

Helge Talvik Söderström

2 Answers

Camil

Victor Basso

Related questions

Recent Activity

Donate For Us