I have Kubernetes installed on Container Linux by CoreOS alpha (1353.1.0) using hyperkube v1.5.5_coreos.0 using my fork of coreos-kubernetes install scripts at https://github.com/kfirufk/coreos-kubernetes.
I have two ContainerOS machines.
kubectl get pods --all-namespaces
returns
NAMESPACE NAME READY STATUS RESTARTS AGE
ceph ceph-mds-2743106415-rkww4 0/1 Pending 0 1d
ceph ceph-mon-check-3856521781-bd6k5 1/1 Running 0 1d
kube-lego kube-lego-3323932148-g2tf4 1/1 Running 0 1d
kube-system calico-node-xq6j7 2/2 Running 0 1d
kube-system calico-node-xzpp2 2/2 Running 4560 1d
kube-system calico-policy-controller-610849172-b7xjr 1/1 Running 0 1d
kube-system heapster-v1.3.0-beta.0-2754576759-v1f50 2/2 Running 0 1d
kube-system kube-apiserver-192.168.1.2 1/1 Running 0 1d
kube-system kube-controller-manager-192.168.1.2 1/1 Running 1 1d
kube-system kube-dns-3675956729-r7hhf 3/4 Running 3924 1d
kube-system kube-dns-autoscaler-505723555-l2pph 1/1 Running 0 1d
kube-system kube-proxy-192.168.1.2 1/1 Running 0 1d
kube-system kube-proxy-192.168.1.3 1/1 Running 0 1d
kube-system kube-scheduler-192.168.1.2 1/1 Running 1 1d
kube-system kubernetes-dashboard-3697905830-vdz23 1/1 Running 1246 1d
kube-system monitoring-grafana-4013973156-m2r2v 1/1 Running 0 1d
kube-system monitoring-influxdb-651061958-2mdtf 1/1 Running 0 1d
nginx-ingress default-http-backend-150165654-s4z04 1/1 Running 2 1d
so I can see that kube-dns-782804071-h78rf
keeps restarting.
kubectl describe pod kube-dns-3675956729-r7hhf --namespace=kube-system
returns:
Name: kube-dns-3675956729-r7hhf
Namespace: kube-system
Node: 192.168.1.2/192.168.1.2
Start Time: Sat, 11 Mar 2017 17:54:14 +0000
Labels: k8s-app=kube-dns
pod-template-hash=3675956729
Status: Running
IP: 10.2.67.243
Controllers: ReplicaSet/kube-dns-3675956729
Containers:
kubedns:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:kubedns
Image: gcr.io/google_containers/kubedns-amd64:1.9
Image ID: rkt://sha512-c7b7c9c4393bea5f9dc5bcbe1acf1036c2aca36ac14b5e17fd3c675a396c4219
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-map=kube-dns
--v=2
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: False
Restart Count: 981
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables:
PROMETHEUS_PORT: 10055
dnsmasq:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq
Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
Image ID: rkt://sha512-8c5f8b40f6813bb676ce04cd545c55add0dc8af5a3be642320244b74ea03f872
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
Requests:
cpu: 150m
memory: 10Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
dnsmasq-metrics:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:dnsmasq-metrics
Image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
Image ID: rkt://sha512-ceb3b6af1cd67389358be14af36b5e8fb6925e78ca137b28b93e0d8af134585b
Port: 10054/TCP
Args:
--v=2
--logtostderr
Requests:
memory: 10Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
healthz:
Container ID: rkt://f6480fe7-4316-4e0e-9483-0944feb85ea3:healthz
Image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
Image ID: rkt://sha512-3a85b0533dfba81b5083a93c7e091377123dac0942f46883a4c10c25cf0ad177
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
State: Running
Started: Sun, 12 Mar 2017 17:47:41 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 12 Mar 2017 17:46:28 +0000
Finished: Sun, 12 Mar 2017 17:47:02 +0000
Ready: True
Restart Count: 981
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqbdp (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-zqbdp:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-zqbdp
QoS Class: Burstable
Tolerations: CriticalAddonsOnly=:Exists
No events.
which shows that kubedns-amd64:1.9
is in Ready: false
this is my kude-dns-de.yaml
file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
spec:
strategy:
rollingUpdate:
maxSurge: 10%
maxUnavailable: 0
selector:
matchLabels:
k8s-app: kube-dns
template:
metadata:
labels:
k8s-app: kube-dns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
spec:
containers:
- name: kubedns
image: gcr.io/google_containers/kubedns-amd64:1.9
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
livenessProbe:
httpGet:
path: /healthz-kubedns
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
initialDelaySeconds: 3
timeoutSeconds: 5
args:
- --domain=cluster.local.
- --dns-port=10053
- --config-map=kube-dns
# This should be set to v=2 only after the new image (cut from 1.5) has
# been released, otherwise we will flood the logs.
- --v=2
env:
- name: PROMETHEUS_PORT
value: "10055"
ports:
- containerPort: 10053
name: dns-local
protocol: UDP
- containerPort: 10053
name: dns-tcp-local
protocol: TCP
- containerPort: 10055
name: metrics
protocol: TCP
- name: dnsmasq
image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4.1
livenessProbe:
httpGet:
path: /healthz-dnsmasq
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --cache-size=1000
- --no-resolv
- --server=127.0.0.1#10053
- --log-facility=-
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
# see: https://github.com/kubernetes/kubernetes/issues/29055 for details
resources:
requests:
cpu: 150m
memory: 10Mi
- name: dnsmasq-metrics
image: gcr.io/google_containers/dnsmasq-metrics-amd64:1.0.1
livenessProbe:
httpGet:
path: /metrics
port: 10054
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
args:
- --v=2
- --logtostderr
ports:
- containerPort: 10054
name: metrics
protocol: TCP
resources:
requests:
memory: 10Mi
- name: healthz
image: gcr.io/google_containers/exechealthz-amd64:v1.2.0
resources:
limits:
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
args:
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- --url=/healthz-dnsmasq
- --cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
- --url=/healthz-kubedns
- --port=8080
- --quiet
ports:
- containerPort: 8080
protocol: TCP
dnsPolicy: Default
and this is my kube-dns-svc.yaml
:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: 10.3.0.10
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
any information regarding the issue would be greatly appreciated!
rkt list --full 2> /dev/null | grep kubedns
shows:
744a4579-0849-4fae-b1f5-cb05d40f3734 kubedns gcr.io/google_containers/kubedns-amd64:1.9 sha512-c7b7c9c4393b running 2017-03-22 22:14:55.801 +0000 UTC 2017-03-22 22:14:56.814 +0000 UTC
journalctl -m _MACHINE_ID=744a45790849b1f5cb05d40f3734
provides:
Mar 22 22:17:58 kube-dns-3675956729-sthcv kubedns[8]: E0322 22:17:58.619254 8 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: connect: network is unreachable
I tried to add - --proxy-mode=userspace
to /etc/kubernetes/manifests/kube-proxy.yaml
but the results are the same.
kubectl get svc --all-namespaces
provides:
NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ceph ceph-mon None <none> 6789/TCP 1h
default kubernetes 10.3.0.1 <none> 443/TCP 1h
kube-system heapster 10.3.0.2 <none> 80/TCP 1h
kube-system kube-dns 10.3.0.10 <none> 53/UDP,53/TCP 1h
kube-system kubernetes-dashboard 10.3.0.116 <none> 80/TCP 1h
kube-system monitoring-grafana 10.3.0.187 <none> 80/TCP 1h
kube-system monitoring-influxdb 10.3.0.214 <none> 8086/TCP 1h
nginx-ingress default-http-backend 10.3.0.233 <none> 80/TCP 1h
kubectl get cs
provides:
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
my kube-proxy.yaml
has the following content:
apiVersion: v1
kind: Pod
metadata:
name: kube-proxy
namespace: kube-system
annotations:
rkt.alpha.kubernetes.io/stage1-name-override: coreos.com/rkt/stage1-fly
spec:
hostNetwork: true
containers:
- name: kube-proxy
image: quay.io/coreos/hyperkube:v1.5.5_coreos.0
command:
- /hyperkube
- proxy
- --cluster-cidr=10.2.0.0/16
- --kubeconfig=/etc/kubernetes/controller-kubeconfig.yaml
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/ssl/certs
name: "ssl-certs"
- mountPath: /etc/kubernetes/controller-kubeconfig.yaml
name: "kubeconfig"
readOnly: true
- mountPath: /etc/kubernetes/ssl
name: "etc-kube-ssl"
readOnly: true
- mountPath: /var/run/dbus
name: dbus
readOnly: false
volumes:
- hostPath:
path: "/usr/share/ca-certificates"
name: "ssl-certs"
- hostPath:
path: "/etc/kubernetes/controller-kubeconfig.yaml"
name: "kubeconfig"
- hostPath:
path: "/etc/kubernetes/ssl"
name: "etc-kube-ssl"
- hostPath:
path: /var/run/dbus
name: dbus
this is all the valuable information I could find. any ideas? :)
output of iptables-save
on the controller ContainerOS at http://pastebin.com/2GApCj0n
I ran curl on the controller node
# curl https://10.3.0.1 --insecure
Unauthorized
means it can access it properly, i didn't add enough parameters for it to be authorized right ?
thanks to @jaxxstorm I removed calico manifests, updated their quay/cni and quay/node versions and reinstalled them.
now kubedns keeps restarting, but I think that now calico works. because for the first time it tries to install kubedns on the worker node and not on the controller node, and also when I rkt enter
the kubedns pod and try to wget https://10.3.0.1
I get:
# wget https://10.3.0.1
Connecting to 10.3.0.1 (10.3.0.1:443)
wget: can't execute 'ssl_helper': No such file or directory
wget: error getting response: Connection reset by peer
which clearly shows that there is some kind of response. which is good right ?
now kubectl get pods --all-namespaces
shows:
kube-system kube-dns-3675956729-ljz2w 4/4 Running 88 42m
so.. 4/4 ready but it keeps restarting.
kubectl describe pod kube-dns-3675956729-ljz2w --namespace=kube-system
output at http://pastebin.com/Z70U331G
so it can't connect to http://10.2.47.19:8081/readiness, i'm guessing this is the ip of kubedns since it uses port 8081. don't know how to continue investigating this issue further.
thanks for everything!
Lots of great debugging info here, thanks!
This is the clincher:
# curl https://10.3.0.1 --insecure
Unauthorized
You got an unauthorized response because you didn't pass a client cert, but that's fine, it's not what we're after. This proves that kube-proxy is working as expected and is accessible. Your rkt logs:
Mar 22 22:17:58 kube-dns-3675956729-sthcv kubedns[8]: E0322 22:17:58.619254 8 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.3.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.3.0.1:443: connect: network is unreachable
Are indicating that the containers that there's network connectivity issues inside the containers, which indicates to me that you haven't configured container networking/CNI.
Please have a read through this document: https://coreos.com/rkt/docs/latest/networking/overview.html
You may also have to reconfigure calico, there's some more information that here: http://docs.projectcalico.org/master/getting-started/rkt/
kube-dns has a readiness probe that tries resolving trough the Service IP of kube-dns. Is it possible that there is a problem with your Service network?
Check out the answer and solution here: kubernetes service IPs not reachable
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With