Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes - Can't connect to a service IP from the service's pod

I'm trying to create 3 instances of Kafka and deploy it a local Kubernetes setup. Because each instance needs some specific configuration, I'm creating one RC and one service for each - eagerly waiting for #18016 ;)

However, I'm having problems because Kafka can't establish a network connection to itself when it uses the service IP (a Kafka broker tries to do this when it is exchanging replication messages with other brokers). For example, let's say I have two worker hosts (172.17.8.201 and 172.17.8.202) and my pods are scheduled like this:

  • Host 1 (172.17.8.201)

    • kafka1 pod (10.2.16.1)
  • Host 2 (172.17.8.202)

    • kafka2 pod (10.2.68.1)
    • kafka3 pod (10.2.68.2)

In addition, let's say I have the following service IPs:

  • kafka1 cluster IP: 11.1.2.96
  • kafka2 cluster IP: 11.1.2.120
  • kafka3 cluster IP: 11.1.2.123

The problem happens when the kafka1 pod (container) tries to send a message (to itself) using the kafka1 cluster IP (11.1.2.96). For some reason, the connection cannot established and the message is not sent.

Some more information: If I manually connect to the kafka1 pod, I can correctly telnet to kafka2 and kafka3 pods using their respective cluster IPs (11.1.2.120 / 11.1.2.123). Also, if I'm in the kafka2 pod, I connect to both kafka1 and kafka3 pods using 11.1.2.96 and 11.1.2.123. Finally, I can connect to all pods (from all pods) if I use the pod IPs.

It is important to emphasize that I shouldn't tell the kafka brokers to use the pod IPs instead of the cluster IPs for replication. As it is right now, Kafka uses for replication whatever IP you configure to be "advertised" - which is the IP that your client uses to connect to the brokers. Even if I could, I believe this problem may appear with other software as well.

This problem seems to happen only with the combination I am using, because the exact same files work correctly in GCE. Right now, I'm running:

  • Kubernetes 1.1.2
  • coreos 928.0.0
  • network setup with flannel
  • everything on vagrant + VirtualBpx

After some debugging, I'm not sure if the problem is in the workers iptables rules, in kube-proxy, or in flannel.

PS: I posted this question originally as an Issue on their github, but I have been redirected to here by the Kubernetes team. I reword the text a bit because it was sounding like it was a "support request", but actually I believe it is some sort of bug. Anyway, sorry about that Kubernetes team!


Edit: This problem has been confirmed as a bug https://github.com/kubernetes/kubernetes/issues/20391

like image 896
virsox Avatar asked Jan 22 '16 00:01

virsox


1 Answers

for what you want to do you should be using a Headless Service http://kubernetes.io/v1.0/docs/user-guide/services.html#headless-services

this means setting

clusterIP: None

in your Service

and that means there won't be an IP associated with the service but it will return all IPs of the Pods selected by the selector

like image 176
MrE Avatar answered Oct 11 '22 17:10

MrE