Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka off-heap memory leak

I'm having hard time figuring out what's causing kafka memory leaking.

scala_version: kafka_2.11
kafka_version: 0.10.2.1

I have around 4GB of memory. Here's how server memory looks like over 1 month: kafka memory leak

I know it's kafka eating into the RAM, because usage would drop whenever I restart it.

And the output from top points to ever increasing resident memory of java process:

PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
23758 kafka     20   0 7673740 1.326g   7992 S 59.5 36.7   6379:29 java
one month later:
23758 kafka     20   0 8756340 2.288g   3736 S 41.9 63.3  45498:06 java

Here's how heap looks like: kafka heap memory

Everything looks good here. So the leak must be off-heap.

I have seen this: https://blog.heroku.com/fixing-kafka-memory-leak but it refers to old version, so this should be long fixed

Then I found this: [KAFKA-4741] - Memory leak in RecordAccumulator.append but it seems to be related to Producer code and I see the leak on tigase VM.

Here's how I produce messages:

String topicName = getTopicName(packet.getElement());
kafkaProducer.send(
    new ProducerRecord<>(
        "dispatch." + topicName, 
        (int) (long) fromUser.getShardId(), // specifies the exact partition that receives the message
        fromUser.getSiteId() + ":" + fromUser.getDeviceId(),
        packet.getElement().toString()
    ),
    producerCallback
);
if (log.isLoggable(Level.FINE)) {
    log.log(Level.FINE, "Adding packet to kafka");
}

I suspect that maybe some specific configuration is causing an issue, though I mostly use default values for everything.

Then on consumer I'm seeing:

%3|1503392176.789|FAIL|rdkafka#producer-2| kafka02:9092/1: Receive failed: Disconnected
%3|1503392176.789|ERROR|rdkafka#producer-2| kafka02:9092/1: Receive failed: Disconnected
%3|1503392176.854|FAIL|rdkafka#consumer-1| kafka01:9092/0: Receive failed: Disconnected
%3|1503392176.854|ERROR|rdkafka#consumer-1| kafka01:9092/0: Receive failed: Disconnected

I'll run some experiments to figure out whether the leak is caused by producing, or consuming. Will also update librdkafka that I use for consuming v0.9.3 -> v0.9.5

If I can figure this out, I'll post an update here. Meanwhile I'm hoping that maybe someone had a similar problem and could point me in the right direction.

UPDATE #1:

I've done a couple of things:

  • Doubled the amount of RAM on virtual machine.
  • Upgraded to kafka 0.11.0.2

It's still leaking, but it's much less of an issue now:

enter image description here

UPDATE #2:

Memory leaking is further reduced after enabling compression in producer config.

after producer compression

I don't know what to make of this. Memory leakage seems to depend on how much data is written to storage.

Possible candidates:

[KAFKA-6529] - Broker leaks memory and file descriptors after sudden client disconnects

[KAFKA-6185] - Selector memory leak with high likelihood of OOM in case of down conversion

I will upgrade to Version 1.1.0 and post another update.

like image 933
Julius Žaromskis Avatar asked Sep 11 '25 11:09

Julius Žaromskis


1 Answers

This has been resolved in kafka_2.11-1.0.1.

kafka memory 1.0.1

Also not sure if related, but I did not realize that at least 3 kafka nodes are required for production.

like image 196
Julius Žaromskis Avatar answered Sep 12 '25 23:09

Julius Žaromskis