Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

kube-dns error: reply from unexpected source

I have a strange error with kube-dns.

Environment: Cluster with a single master and a few nodes deployed on AWS with kops. Kubernetes version 1.8.4.

Problem is that I have flakiness in DNS name resolution (both cluster-internal or external names) in my pods. After troubleshooting I understood the problem arises only when pods are scheduled on a specific node, which is the one where one of the replicas of the kube-dns pod is running.

These are my kube-dns pods:

$ kubectl -n kube-system get po -l k8s-app=kube-dns -o wide
NAME                        READY     STATUS    RESTARTS   AGE       IP             NODE
kube-dns-7f56f9f8c7-2ztbn   3/3       Running   0          2d        100.96.8.239   node01
kube-dns-7f56f9f8c7-h5w29   3/3       Running   0          17d       100.96.7.114   node02

If I run a test POD forcing it to run on node02 everything seems fine. I can resolve any (valid) DNS name with no issues at all.

If I run the same test POD on node01 name resolution is flaky: sometimes it fails (roughly 50% of the times) with the following error

$ dig google.com
;; reply from unexpected source: 100.96.8.239#53, expected 100.64.0.10#53

The rest of the times it works flawlessly:

$ dig google.com

; <<>> DiG 9.10.4-P3 <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24454
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             60      IN      A       209.85.202.100
google.com.             60      IN      A       209.85.202.101
google.com.             60      IN      A       209.85.202.102
google.com.             60      IN      A       209.85.202.113
google.com.             60      IN      A       209.85.202.138
google.com.             60      IN      A       209.85.202.139

;; Query time: 2 msec
;; SERVER: 100.64.0.10#53(100.64.0.10)
;; WHEN: Mon Jan 08 10:46:42 UTC 2018
;; MSG SIZE  rcvd: 135

/etc/resolv.conf points correctly to the kube-dns service's IP address:

$ head -n 1 /etc/resolv.conf 
nameserver 100.64.0.10

$ kubectl -n kube-system get svc kube-dns 
NAME       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
kube-dns   ClusterIP   100.64.0.10   <none>        53/UDP,53/TCP   33d

Apparently, on node01 only, when the request is forwarded by the service to the kube-dns pod instance that runs on node01 itself the error is triggered.

I tried restarting kube-proxy on node01 but the problem remains.

I bet that rebooting/recreating node01 would make the problem go away, but I need to ensure this problem doesn't happen again.

Does anybody have an idea what's going on?

like image 573
whites11 Avatar asked Mar 07 '23 04:03

whites11


1 Answers

I found an issue on github that looks very similar to this one I am having, and the solution posted there seems to work.

Basically, I needed to load a kernel module with the following command:

modprobe br_netfilter

Of course, YMMV

like image 70
whites11 Avatar answered Mar 18 '23 18:03

whites11