Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka and firewall rules

We have a fairly strict network segmentation policy. I am using a cloud foundry instance to deploy an app to. The firewall rules have been set up to reach the kafka cluster from within the cloud foundry instance. I believe that the firewall rules have also been set up to get to the zookeeper instance as well. I need to actually confirm that one.

My problem seems to be that I can produce messages to kafka, but my consumer doesn't seem to be picking them up. It seems to hang while "polling".

Is there some hidden hosts or ports that I need to deal with for my firewall rules that are not just the standard hosts and ports to the kafka and zookeeper nodes?

like image 424
George Smith Avatar asked Jul 22 '16 16:07

George Smith


People also ask

What ports need to be open for Kafka?

By default, the Kafka server is started on port 9092 . Kafka uses ZooKeeper, and hence a ZooKeeper server is also started on port 2181 . If the current default ports don't suit you, you can change either by adding the following in your build.

What network protocol does Kafka use?

Network. Kafka uses a binary protocol over TCP. The protocol defines all APIs as request response message pairs.

Why does Kafka use TCP?

Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip.

Why are there 3 brokers in Kafka?

If we have 3 Kafka brokers spread across 3 datacenters, then a partition with 3 replicas will never have multiple replicas in the same datacenter. With this configuration, datacenter outages are not significantly different from broker outages.


2 Answers

Kafka and zookeeper are different things. If you are running both on the same machine, you need to open both ports, of corse.

kafka default ports:

  • 9092, can be changed on server.properties;

zookeeper default ports:

  • 2181 for client connections;
  • 2888 for follower(other zookeeper nodes) connections;
  • 3888 for inter nodes connections;

That's it.

Kafka, also has the listeners and advertised.listeners properties which grows some confusion on first users. To make it simple, listener is the network interface your server will bind, and advertised.listeners is the hostname or IP your server will register itself on zookeeper and listen to requests. If you put a hostname in there, your clients WILL have to use the hostname to connect. The advertised.listeners url is the one your clients will use to bootstrap the connection. Once connection is made, your client will get a connection to zookeeper to get other brokers urls. Your producer is not working because of that.

So, to make it work you need to open 2888 on your firewall too, not just 2181. And @Jaya Ananthram is wrong when he tells you that kafka needs 2181 port. It's a zookeeper port. The consumers on kafka 0.10 stills needs to contact zookeeper to persist some things, thats it.

Kafka 0.11.0.0 changed this and is making clients don't need zookeeper at all.

like image 113
Marcos Arruda Avatar answered Oct 17 '22 13:10

Marcos Arruda


TL;DR : There's no hidden port. Check your broker configuration. Make sure that it advertises IP/PORT that's reachable by Kafka consumers.


I came across this question after experiencing the same problem with Kafka 0.10.1.1 with kafka-python library as a consumer.

No. I captured network traffic and it doesn't use any other port to communicate with Kafka. If the brokers are configured to use 9092, it will be the only port used by consumers.

But upon further investigations, broker configurations were at fault in my case.

kafka.advertised.listeners = PLAINTEXT://[private_ip]:9092,SSL://[public_ip]:9093 kafka.listeners = PLAINTEXT://0.0.0.0:9092,SSL://0.0.0.0:9093

I used [public_ip]:9092 as a bootstrap server because I did not have PKI set up but I wanted to test my consumer from public internet.

The consumer was able to connect to the broker but wasn't able to pull any message.

Since the consumer connected to Kafka using PLAINTEXT, Kafka advertised PLAINTEXT broker addresses instead of SSL addresses. The consumer then tried to reach Kafka brokers using private IP addresses instead of public ones. (as revealed by raw network capture)

After the PKI was enabled and configured in brokers & clients, I was able to pull messages from public internet just fine.

like image 20
Duke Grouchy Avatar answered Oct 17 '22 15:10

Duke Grouchy