After a cluster upgrade, one of three masters can't connect back to the cluster. I have a HA cluster running in us-east-1a, us-east-1b and us-east-1c, my master that is running in us-east-1a can't join back to the cluster.
I tried to scale down the master-us-east-1a instance group to zero nodes and back it to one node but the EC2 machine starts with the same problem, can't join back to the cluster again, seems to starts with a backup or something.
I tried to connect to the master to restart the services, maybe protukube or docker, but I can't solve the problem too.
Connecting via ssh in the master I noticed that the flannel service is not running in this machine. I tried to run manually via docker without success. Seems that flannel is the network service that should be running and is not.
Thanks in advance.
attachments
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-xxx-xxx-xxx-xxx.ec2.internal Ready node 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready master 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready node 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready master 33d v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal Ready node 33d v1.11.9
-
> sudo systemctl status kubelet
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.026553 2502 kubelet_node_status.go:441] Recording NodeHasSufficientPID event message for node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.027005 2502 kubelet_node_status.go:79] Attempting to register node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: E0110 21:00:55.027764 2502 kubelet_node_status.go:103] Unable to register node "ip-xxx-xxx-xxx-xxx.ec2.internal" with API server: Post https://127.0.0.1/api/v1/nodes: dial tcp 127.0.0.1:443: connect: connection refused
-
> sudo docker logs k8s_kube-apiserver_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_16
F0110 20:59:35.581865 1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 /registry [http://127.0.0.1:4001] true false 1000 0xc42013c480 <nil> 5m0s 1m0s}), err (dial tcp 127.0.0.1:4001: connect: connection refused)
-
> sudo docker version
Client:
Version: 17.03.2-ce
API version: 1.27
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 02:31:19 2017
OS/Arch: linux/amd64
Server:
Version: 17.03.2-ce
API version: 1.27 (minimum version 1.12)
Go version: go1.7.5
Git commit: f5ec1e2
Built: Tue Jun 27 02:31:19 2017
OS/Arch: linux/amd64
Experimental: false
-
> kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:40:24Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server 127.0.0.1 was refused - did you specify the right host or port?
-
> sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
protokube 1.15.0 6b00e7216827 7 weeks ago 288 MB
k8s.gcr.io/kube-proxy v1.11.9 e18fcce798b8 9 months ago 98.1 MB
k8s.gcr.io/kube-controller-manager v1.11.9 634ccbd18a0f 9 months ago 155 MB
k8s.gcr.io/kube-apiserver v1.11.9 ef9a84756d40 9 months ago 187 MB
k8s.gcr.io/kube-scheduler v1.11.9 e00d30bd3a71 9 months ago 56.9 MB
k8s.gcr.io/pause-amd64 3.0 99e59f495ffa 3 years ago 747 kB
kopeio/etcd-manager 3.0.20190930 7937b67f722f 50 years ago 656 MB
-
> sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b4eb0ec9e6a2 k8s.gcr.io/kube-scheduler@sha256:372ab1014701f60b67a65d94f94d30d19335294d98746edcdfcb8808ed5aee3c "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_kube-scheduler_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
8f827dc0eade kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_etcd-manager_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
5bebb169b8b3 k8s.gcr.io/kube-controller-manager@sha256:aa9b9dac085a65c47746fa8739cf70e9d7e9a356a836ad2ef073da0d7b136db2 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_kube-controller-manager_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
4467d550824e k8s.gcr.io/kube-proxy@sha256:a63c81fe4d3e9575cc0a29c4866a2975b01a07c0f473ab2cf1e88ebf78739f80 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_kube-proxy_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
0a5c23006e18 kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8 "/bin/sh -c 'mkfif..." 15 hours ago Up 15 hours k8s_etcd-manager_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
3efa9ae55618 k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
4e451bc007ac k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
7c5c301e034a k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_0
d88f075fa61f k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
69e8844e9c14 k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
05e67c2e8f98 k8s.gcr.io/pause-amd64:3.0 "/pause" 15 hours ago Up 15 hours k8s_POD_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
eee0a4d563c0 protokube:1.15.0 "/usr/bin/protokub..." 15 hours ago Up 15 hours hungry_shirley
The Kubelet is trying to register the master node us-east-1a with an API Server endpoint https://127.0.0.1:443. I believe this should be API server endpoint of any of the other two masters. Kubelet uses kubelet.conf file to talk to the API Server to register node.Change the server
in kubelet.conf file located at /etc/kubernetes
to point to one of the below:
After changing kubelet.conf restart kubelet.
Edit: Since you are using etcd manager can you try the Kubernetes service unavailable / flannel issues troubleshooting step described here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With