How to manage page cache resources when running Kafka in Kubernetes

Tags:

I've been running Kafka on Kubernetes without any major issue for a while now; however, I recently introduced a cluster of Cassandra pods and started having performance problems with Kafka.

Even though Cassandra doesn't use page cache like Kafka does, it does make frequent writes to disk, which presumably effects the kernel's underlying cache.

I understand that Kubernetes pods are managing memory resources through cgroups, which can be configured by setting memory requests and limits in Kubernetes, but I've noticed that Cassandra's utilization of page cache can increase the number of page faults in my Kafka pods even when they don't seem to be competing for resources (i.e., there's memory available on their nodes).

In Kafka more page faults leads to more writes to disk, which hamper the benefits of sequential IO and compromise disk performance. If you use something like AWS's EBS volumes, this will eventually deplete your burst balance and eventually cause catastrophic failures across your cluster.

My question is, is it possible to isolate page cache resources in Kubernetes or somehow let the kernel know that pages owned by my Kafka pods should be kept in the cache longer than those in my Cassandra pods?

808

asked Feb 04 '18 15:02

kellanburket

1 Answers

I thought this was an interesting question, so this is a posting of some findings from a bit of digging.

Best guess: there is no way with k8s OOB to do this, but enough tooling is available such that it could be a fruitful area for research and development of a tuning and policy application that could be deployed as a DaemonSet.

Findings:

Applications can use the fadvise() system call to provide guidance to the kernel regarding which file-backed pages are needed by the application and which are not and can be reclaimed.

http://man7.org/linux/man-pages/man2/posix_fadvise.2.html

Applications can also use O_DIRECT to attempt to avoid the use of page cache when doing IO:

https://lwn.net/Articles/457667/

There is some indication that Cassandra already uses fadvise in a way that attempts to optimize for reducing its page cache footprint:

http://grokbase.com/t/cassandra/commits/122qha309v/jira-created-cassandra-3948-sequentialwriter-doesnt-fsync-before-posix-fadvise

There is also some recent (Jan 2017) research from Samsung patching Cassandra and fadvise in the kernel to better utilize multi-stream SSDs:

http://www.samsung.com/us/labs/pdfs/collateral/Multi-stream_Cassandra_Whitepaper_Final.pdf

Kafka is page cache architecture aware, though it doesn't appear to use fadvise directly. The knobs available from the kernel are sufficient for tuning Kafka on a dedicated host:

vm.dirty* for guidance on when to get written-to (dirty) pages back onto disk
vm.vfs_cache_pressure for guidance on how aggressive to be in using RAM for page cache

Support in the kernel for device-specific writeback threads goes way back to the 2.6 days:

https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics

Cgroups v1 and v2 focus on pid-based IO throttling, not file-based cache tuning:

https://andrestc.com/post/cgroups-io/

That said, the old linux-ftools set of utilities has a simple example of a command-line knob for use of fadvise on specific files:

https://github.com/david415/linux-ftools

So there's enough there. Given specific kafka and cassandra workloads (e.g. read-heavy vs write-heavy), specific prioritizations (kafka over cassandra or vice versa) and specific IO configurations (dedicated vs shared devices), one could emerge with a specific tuning model, and those could be generalized into a policy model.

answered Oct 13 '22 14:10

Jonah Benton

Related questions
                            
                                How to check if a topic was consumed by a consumer in Kafka
                            
                                kafka-node start consume from last offset
                            
                                how to get the group commit offset from kafka(0.10.x)
                            
                                Distributed Kafka Connect topic configuration
                            
                                kafka streams session window retention duration
                            
                                Why does co-partitioning of two Kstreams in kafka require same number of partitions for both the streams?
                            
                                Kafka minimal (production) setup
                            
                                Streaming messages from one Kafka Cluster to another
                            
                                kafka connect multiple topics in sink connector properties
                            
                                Difference between group id, Client id and id in KafkaListener Spring Boot
                            
                                Where to set maximum message size in Apache Kafka?
                            
                                Can you use Consul instead of Zookeeper for Kafka
                            
                                Spark unable to download kafka library
                            
                                What happens if offset specified by kafka consumer is not present in Broker?
                            
                                Kafka docker image that works without zookeepr
                            
                                Is it possible to read from multiple partitions using Kafka Simple Consumer?
                            
                                how to use kafka acls?
                            
                                Confluent Schema Registry Cluster Mode
                            
                                IoT data system design: Google Pub/Sub vs Kafka vs Kinesis vs PubNub for IoT data ingestion?
                            
                                Streaming from particular partition within a topic (Kafka Streams)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to manage page cache resources when running Kafka in Kubernetes

Tags:

kubernetes

apache-kafka

cgroups

page-caching

kellanburket

People also ask

1 Answers

Jonah Benton

Recent Activity

Donate For Us