Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calico: Kubernetes pods can't ping each other use Cluster IP

I installed kubernetes using kubeadm v1.14.0 and added two working nodes via the join command. kubeadm config

apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: v1.14.0
controlPlaneEndpoint: "172.22.203.12:6443"
networking:
provider.
    podSubnet: "111.111.0.0/16"

node list

NAME    STATUS   ROLES    AGE   VERSION
linan   Ready    <none>   13h   v1.14.0
node2   Ready    <none>   13h   v1.14.0
yiwu    Ready    master   13h   v1.14.0

I checked all pod is bootup

kubectl get pods -n kube-system

NAME                            READY   STATUS    RESTARTS   AGE
calico-node-h49t9               2/2     Running   1          13h
calico-node-mplwx               2/2     Running   0          13h
calico-node-twvsd               2/2     Running   0          13h
calico-typha-666749994b-d68qg   1/1     Running   0          13h
coredns-8567978547-dhbn4        1/1     Running   0          14h
coredns-8567978547-zv5w5        1/1     Running   0          14h
etcd-yiwu                       1/1     Running   0          13h
kube-apiserver-yiwu             1/1     Running   0          13h
kube-controller-manager-yiwu    1/1     Running   0          13h
kube-proxy-7pjcx                1/1     Running   0          13h
kube-proxy-96d2j                1/1     Running   0          13h
kube-proxy-j5cnw                1/1     Running   0          14h
kube-scheduler-yiwu             1/1     Running   0          13h

This is the two pods I used to test usability.

kubectl get pods -owide

NAME             READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
nginx-ds-2br6j   1/1     Running   0          13h   111.111.1.2   linan   <none>           <none>
nginx-ds-t7sfv   1/1     Running   0          13h   111.111.2.2   node2   <none>           <none>

but I can't ping pod id from any node(include master) or access the services provided by pod and pod.

[root@YiWu ~]# ping 111.111.1.2
PING 111.111.1.2 (111.111.1.2) 56(84) bytes of data.
^C
--- 111.111.1.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

[root@YiWu ~]# ping 111.111.2.2
PING 111.111.2.2 (111.111.2.2) 56(84) bytes of data.
^C
--- 111.111.2.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

Each node can only access pods on their own host

I checked node calico node log,this log appears on some nodes and some do not.

YiWu

bird: BGP: Unexpected connect from unknown address 172.19.0.1 (port 56754)
bird: BGP: Unexpected connect from unknown address 172.19.0.1 (port 40364)

node2

bird: BGP: Unexpected connect from unknown address 172.22.203.11 (port 57996)
bird: BGP: Unexpected connect from unknown address 172.22.203.11 (port 59485)

linan

no

I install calicoctl check node status in YiWu node

DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config calicoctl get node -owide

NAME    ASN         IPV4            IPV6   
linan   (unknown)   172.18.0.1/16          
node2   (unknown)   172.20.0.1/16          
yiwu    (unknown)   172.19.0.1/16 
DATASTORE_TYPE=kubernetes KUBECONFIG=~/.kube/config calicoctl node status 
Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |              INFO              |
+--------------+-------------------+-------+----------+--------------------------------+
| 172.18.0.1   | node-to-node mesh | start | 12:23:15 | Connect                        |
| 172.20.0.1   | node-to-node mesh | start | 12:23:18 | OpenSent Socket: Connection    |
|              |                   |       |          | closed                         |
+--------------+-------------------+-------+----------+--------------------------------+

IPv6 BGP status
No IPv6 peers found.

EDIT

sysctl -p  /etc/sysctl.d/kubernetes.conf 

net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
vm.swappiness = 0
vm.overcommit_memory = 1
vm.panic_on_oom = 0
fs.inotify.max_user_watches = 89100

already set ip forward of all node

like image 680
Cliven Avatar asked Aug 15 '19 02:08

Cliven


2 Answers

I restarted calico and checked its log

kubectl delete-f /etc/kubernetes/addons/calico.yaml
kubectl apply -f /etc/kubernetes/addons/calico.yaml
kubectl get pods -n kube-system
kubectl log calico-node-dp69k -c calico-node -n kube-system

calico-node-dp69k is calico node name Check out the calico log and found a strange network card as the boot NIC. like below

2019-08-15 04:39:10.859 [INFO][8] startup.go 564: Using autodetected IPv4 address on interface br-b733428777f6: 172.19.0.1/16

obviously br-b733428777f6 is not I expected

I checked calico configuration doc about IP_AUTODETECTION_METHOD

default calico will use first-found mode to select network interface

The first-found option enumerates all interface IP addresses and returns the first valid IP address (based on IP version and type of address) on the first valid interface.

In my case, can-reach is more suitable for me

so I edit calico.yaml, and add IP_AUTODETECTION_METHOD like this:

spec:
  hostNetwork: true
  serviceAccountName: calico-node
  terminationGracePeriodSeconds: 0
  containers:
    - name: calico-node
      image: quay.io/calico/node:v3.1.3
      env:
        - name: IP_AUTODETECTION_METHOD
          value: can-reach=172.22.203.1

can-reach=172.22.203.1 of 172.22.203.1 is gateway ip, then

kubectl delete-f /etc/kubernetes/addons/calico.yaml
kubectl apply -f /etc/kubernetes/addons/calico.yaml

check the log:

2019-08-15 04:50:27.942 [INFO][10] reachaddr.go 46: Auto-detected address by connecting to remote Destination="172.22.203.1" IP=172.22.203.10

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="cali7b8c9bd2e1f"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="veth24c7125"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="br-0b07d34c53b5"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 57: Checking CIDR CIDR="172.18.0.1/16"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="tunl0"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 57: Checking CIDR CIDR="111.111.1.1/32"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="docker0"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 57: Checking CIDR CIDR="172.17.0.1/16"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="enp0s20u1u5"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="eno4"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="eno3"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="eno2"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 55: Checking interface CIDRs Name="eno1"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 57: Checking CIDR CIDR="172.22.203.10/24"

2019-08-15 04:50:27.943 [INFO][10] reachaddr.go 59: Found matching interface CIDR CIDR="172.22.203.10/24"

2019-08-15 04:50:27.943 [INFO][10] startup.go 590: Using autodetected IPv4 address 172.22.203.10/24, detected by connecting to 172.22.203.1

wow, it choose right dev interface

go check pod IP is accessible, Accessable!

done

like image 177
Cliven Avatar answered Oct 23 '22 13:10

Cliven


For future Googler. In my cases,

I used the operator:

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    # Note: The ipPools section cannot be modified post-install.
    ipPools:
      - blockSize: 26
        cidr: 10.244.0.0/16 # your pod cidr 
        encapsulation: VXLANCrossSubnet
        natOutgoing: Enabled
        nodeSelector: all()
    nodeAddressAutodetectionV4:
      interface: ens* # Change this one to fix the autodetected issue. My interface is ensxxx

And somehow if it doesn't work and it is because you might install flannel, cilium or etc before,

You need to remove the network interface first.

ip link

For each interface for flannel, do the following

ifconfig <name of interface from ip link> down

ip link delete <name of interface from ip link>

like image 3
maxisam Avatar answered Oct 23 '22 11:10

maxisam