I'm trying to create 3 instances of Kafka and deploy it a local Kubernetes setup. Because each instance needs some specific configuration, I'm creating one RC and one service for each - eagerly waiting for #18016 ;)
However, I'm having problems because Kafka can't establish a network connection to itself when it uses the service IP (a Kafka broker tries to do this when it is exchanging replication messages with other brokers). For example, let's say I have two worker hosts (172.17.8.201 and 172.17.8.202) and my pods are scheduled like this:
Host 1 (172.17.8.201)
kafka1
pod (10.2.16.1)Host 2 (172.17.8.202)
kafka2
pod (10.2.68.1)kafka3
pod (10.2.68.2)In addition, let's say I have the following service IPs:
kafka1
cluster IP: 11.1.2.96kafka2
cluster IP: 11.1.2.120kafka3
cluster IP: 11.1.2.123The problem happens when the kafka1
pod (container) tries to send a message (to itself) using the kafka1
cluster IP (11.1.2.96). For some reason, the connection cannot established and the message is not sent.
Some more information: If I manually connect to the kafka1
pod, I can correctly telnet to kafka2
and kafka3
pods using their respective cluster IPs (11.1.2.120 / 11.1.2.123). Also, if I'm in the kafka2
pod, I connect to both kafka1
and kafka3
pods using 11.1.2.96 and 11.1.2.123. Finally, I can connect to all pods (from all pods) if I use the pod IPs.
It is important to emphasize that I shouldn't tell the kafka brokers to use the pod IPs instead of the cluster IPs for replication. As it is right now, Kafka uses for replication whatever IP you configure to be "advertised" - which is the IP that your client uses to connect to the brokers. Even if I could, I believe this problem may appear with other software as well.
This problem seems to happen only with the combination I am using, because the exact same files work correctly in GCE. Right now, I'm running:
After some debugging, I'm not sure if the problem is in the workers iptables rules, in kube-proxy, or in flannel.
PS: I posted this question originally as an Issue on their github, but I have been redirected to here by the Kubernetes team. I reword the text a bit because it was sounding like it was a "support request", but actually I believe it is some sort of bug. Anyway, sorry about that Kubernetes team!
Edit: This problem has been confirmed as a bug https://github.com/kubernetes/kubernetes/issues/20391
for what you want to do you should be using a Headless Service http://kubernetes.io/v1.0/docs/user-guide/services.html#headless-services
this means setting
clusterIP: None
in your Service
and that means there won't be an IP associated with the service but it will return all IPs of the Pods selected by the selector
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With