I'm having hard time figuring out what's causing kafka memory leaking.
scala_version: kafka_2.11
kafka_version: 0.10.2.1
I have around 4GB of memory. Here's how server memory looks like over 1 month:
I know it's kafka eating into the RAM, because usage would drop whenever I restart it.
And the output from top points to ever increasing resident memory of java process:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23758 kafka 20 0 7673740 1.326g 7992 S 59.5 36.7 6379:29 java
one month later:
23758 kafka 20 0 8756340 2.288g 3736 S 41.9 63.3 45498:06 java
Here's how heap looks like:
Everything looks good here. So the leak must be off-heap.
I have seen this: https://blog.heroku.com/fixing-kafka-memory-leak but it refers to old version, so this should be long fixed
Then I found this: [KAFKA-4741] - Memory leak in RecordAccumulator.append but it seems to be related to Producer code and I see the leak on tigase VM.
Here's how I produce messages:
String topicName = getTopicName(packet.getElement());
kafkaProducer.send(
new ProducerRecord<>(
"dispatch." + topicName,
(int) (long) fromUser.getShardId(), // specifies the exact partition that receives the message
fromUser.getSiteId() + ":" + fromUser.getDeviceId(),
packet.getElement().toString()
),
producerCallback
);
if (log.isLoggable(Level.FINE)) {
log.log(Level.FINE, "Adding packet to kafka");
}
I suspect that maybe some specific configuration is causing an issue, though I mostly use default values for everything.
Then on consumer I'm seeing:
%3|1503392176.789|FAIL|rdkafka#producer-2| kafka02:9092/1: Receive failed: Disconnected
%3|1503392176.789|ERROR|rdkafka#producer-2| kafka02:9092/1: Receive failed: Disconnected
%3|1503392176.854|FAIL|rdkafka#consumer-1| kafka01:9092/0: Receive failed: Disconnected
%3|1503392176.854|ERROR|rdkafka#consumer-1| kafka01:9092/0: Receive failed: Disconnected
I'll run some experiments to figure out whether the leak is caused by producing, or consuming. Will also update librdkafka that I use for consuming v0.9.3 -> v0.9.5
If I can figure this out, I'll post an update here. Meanwhile I'm hoping that maybe someone had a similar problem and could point me in the right direction.
I've done a couple of things:
It's still leaking, but it's much less of an issue now:
Memory leaking is further reduced after enabling compression in producer config.
I don't know what to make of this. Memory leakage seems to depend on how much data is written to storage.
Possible candidates:
[KAFKA-6529] - Broker leaks memory and file descriptors after sudden client disconnects
[KAFKA-6185] - Selector memory leak with high likelihood of OOM in case of down conversion
I will upgrade to Version 1.1.0 and post another update.
This has been resolved in kafka_2.11-1.0.1.
Also not sure if related, but I did not realize that at least 3 kafka nodes are required for production.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With