Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how is cluster IP in kubernetes-aws configured?

I am very new to kubernetes and have just got a stock kubernetes v.1.3.5 cluster up on AWS using kube-up. So far, I have been playing around with kubernetes in understanding it's mechanics (nodes, pods, svc and stuff). Based on my initial (or maybe crude) understanding , I had few questions:

1) How does routing to cluster IP work here (i.e in kube-aws) ? I see that the services have IPs in the range 10.0.0.0/16. I did a deployment with rc=3 of stock nginx and then attached a service to it with Node Port exposed. All works great! I can connect to the service from my dev machine. This nginx service has a cluster IP of 10.0.33.71:1321. Now, if I ssh into one of the minions(or nodes or VMS) and do a "telnet 10.0.33.71 1321", it connects as expected. But I am clueless how this works, I couldn't find any routes related to 10.0.0.0/16 in the VPC setup by kubernetes. What exactly happens under the hood here that results in a successful connection for app like telnet? However, If I ssh into the master node and do "telnet 10.0.33.71 1321", it does not connect. Why does it fail to connect from master?

2) There is a cbr0 interface inside each node. Each minion node has cbr0 configured as 10.244.x.0/24 and master has cbr0 as 10.246.0.0/24. I can ping to any of the 10.244.x.x pods from any of the nodes(including master). But I am not able to ping 10.246.0.1 (cbr0 inside master node) from any of the minion nodes. What could be happening here?

Here's the routes set up by kubernetes in aws. VPC.

Destination     Target
172.20.0.0/16   local
0.0.0.0/0       igw-<hex value>
10.244.0.0/24   eni-<hex value> / i-<hex value>
10.244.1.0/24   eni-<hex value> / i-<hex value>
10.244.2.0/24   eni-<hex value> / i-<hex value>
10.244.3.0/24   eni-<hex value> / i-<hex value>
10.244.4.0/24   eni-<hex value> / i-<hex value>
10.246.0.0/24   eni-<hex value> / i-<hex value>
like image 717
soupybionics Avatar asked Aug 26 '16 06:08

soupybionics


1 Answers

Mark Betz (SRE at Olark) presents Kubernetes networking in three articles:

  • pods
  • services:
  • ingress

For a pod, you are looking at:

pod network

You find:

  • etho0: a "physical network interface"
  • docker0/cbr0: a bridge for connecting two ethernet segments no matter their protocol.
  • veth0, 1, 2: Virtual Network Interface, one per container.
    docker0 is the default Gateway of veth0. It is named cbr0 for "custom bridge".
    Kubernetes starts containers by sharing the same veth0, which means each container must expose different ports.
  • pause: a special container started in "pause", to detect SIGTERM sent to a pod, and forward it to the containers.
  • node: an host
  • cluster: a group of nodes
  • router/gateway

The last element is where things start to be more complex:

Kubernetes assigns an overall address space for the bridges on each node, and then assigns the bridges addresses within that space, based on the node the bridge is built on.
Secondly, it adds routing rules to the gateway at 10.100.0.1 telling it how packets destined for each bridge should be routed, i.e. which node’s eth0 the bridge can be reached through.

Such a combination of virtual network interfaces, bridges, and routing rules is usually called an overlay network.

When a pod contacts another pod, it goes through a service.
Why?

Pod networking in a cluster is neat stuff, but by itself it is insufficient to enable the creation of durable systems. That’s because pods in Kubernetes are ephemeral.
You can use a pod IP address as an endpoint but there is no guarantee that the address won’t change the next time the pod is recreated, which might happen for any number of reasons.

That means: you need a reverse-proxy/dynamic load-balancer. And it better be resilient.

A service is a type of kubernetes resource that causes a proxy to be configured to forward requests to a set of pods.
The set of pods that will receive traffic is determined by the selector, which matches labels assigned to the pods when they were created

That service uses its own network. By default, its type is "ClusterIP"; it has its own IP.

Here is the communication path between two pods:

two pods network

It uses a kube-proxy.
This proxy uses itself a netfilter.

netfilter is a rules-based packet processing engine.
It runs in kernel space and gets a look at every packet at various points in its life cycle.
It matches packets against rules and when it finds a rule that matches it takes the specified action.
Among the many actions it can take is redirecting the packet to another destination.

kube-proxy and netfilter

In this mode, kube-proxy:

  • opens a port (10400 in the example above) on the local host interface to listen for requests to the test-service,
  • inserts netfilter rules to reroute packets destined for the service IP to its own port, and
  • forwards those requests to a pod on port 8080.

That is how a request to 10.3.241.152:80 magically becomes a request to 10.0.2.2:8080.
Given the capabilities of netfilter all that’s required to make this all work for any service is for kube-proxy to open a port and insert the correct netfilter rules for that service, which it does in response to notifications from the master api server of changes in the cluster.

But:

There’s one more little twist to the tale.
I mentioned above that user space proxying is expensive due to marshaling packets. In kubernetes 1.2, kube-proxy gained the ability to run in iptables mode.

In this mode, kube-proxy mostly ceases to be a proxy for inter-cluster connections, and instead delegates to netfilter the work of detecting packets bound for service IPs and redirecting them to pods, all of which happens in kernel space.
In this mode kube-proxy’s job is more or less limited to keeping netfilter rules in sync.

The network schema becomes:

netfilter in action

However, this is not a good fit for external (public facing) communication, which needs an external fixed IP.

You have dedicated services for that: nodePort and LoadBalancer:

A service of type NodePort is a ClusterIP service with an additional capability: it is reachable at the IP address of the node as well as at the assigned cluster IP on the services network.
The way this is accomplished is pretty straightforward:

When kubernetes creates a NodePort service, kube-proxy allocates a port in the range 30000–32767 and opens this port on the eth0 interface of every node (thus the name “NodePort”).

Connections to this port are forwarded to the service’s cluster IP.

You get:

load-balancer / nodeport

A Loadalancer is more advancer, and allows to expose services using stand ports.
See the mapping here:

$ kubectl get svc service-test
NAME      CLUSTER-IP      EXTERNAL-IP     PORT(S)          AGE
openvpn   10.3.241.52     35.184.97.156   80:32213/TCP     5m

However:

Services of type LoadBalancer have some limitations.

  • You cannot configure the lb to terminate https traffic.
  • You can’t do virtual hosts or path-based routing, so you can’t use a single load balancer to proxy to multiple services in any practically useful way.

These limitations led to the addition in version 1.2 of a separate kubernetes resource for configuring load balancers, called an Ingress.

The Ingress API supports TLS termination, virtual hosts, and path-based routing. It can easily set up a load balancer to handle multiple backend services.
The implementation follows a basic kubernetes pattern: a resource type and a controller to manage that type.
The resource in this case is an Ingress, which comprises a request for networking resources

For instance:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test-ingress
  annotations:
    kubernetes.io/ingress.class: "gce"
spec:
  tls:
    - secretName: my-ssl-secret
  rules:
  - host: testhost.com
    http:
      paths:
      - path: /*
        backend:
          serviceName: service-test
          servicePort: 80

The ingress controller is responsible for satisfying this request by driving resources in the environment to the necessary state.
When using an Ingress you create your services as type NodePort and let the ingress controller figure out how to get traffic to the nodes.

There are ingress controller implementations for GCE load balancers, AWS elastic load balancers, and for popular proxies such as NGiNX and HAproxy.

like image 199
VonC Avatar answered Nov 09 '22 09:11

VonC