I have read instructions on
do not use swap
both on zookeeper and kafka. I know that kafka depends on the pagecaching to keep parts of sequential logs cached in-memory even they are written to disk.
But can not understand how swapping can harm zk and kafka.
Replacing ZooKeeper with internally managed metadata will improve scalability and management, according to Kafka's developers. Change is coming for users of Apache Kafka, the leading distributed event-streaming platform.
If one the ZooKeeper nodes fails, the following occurs: Other ZooKeeper nodes detect the failure to respond. A new ZooKeeper leader is elected if the failed node is the current leader. If multiple nodes fail and ZooKeeper loses its quorum, it will drop into read-only mode and reject requests for changes.
However, you can install and run Kafka without Zookeeper. In this case, instead of storing all the metadata inside Zookeeper, all the Kafka configuration data will be stored as a separate partition within Kafka itself.
In general, ZooKeeper provides an in-sync view of the Kafka cluster. Kafka, on the other hand, is dedicated to handling the actual connections from the clients (producers and consumers) as well as managing the topic logs, topic log partitions, consumer groups ,and individual offsets.
Swapping may cause performance as well as stability problems; in your example, you don't want the Linux kernel to "mistakenly/accidentally" swap your Kafka or ZooKeeper processes.
Also, swapping may be particularly bad for JVM processes such as Kafka and ZooKeeper, quoting:
[The] JVM generally won't do a full GC cycle until it has run out of its allowed heap, so most of your heap is likely occupied by not-yet-collected garbage. Since these pages aren't being touched (because they are garbage and thus unreferenced), the OS happily swaps them out. When GC finally runs, you have a ridiculous swap storm, pulling in all these pages only to then discover that they are in fact filled with garbage and should be discarded; this can easily make your GC cycle take many minutes!
Hence the recommendation to disable swapping by setting vm.swappiness
to 0
, though for some operating systems like RHEL 6.5 this should actually be 1
(because the semantics of the value 0
was changed on these OS's). Note that some swapping may still occur.
The following links may shed further light on your question. They explain why to disable swapping for Hadoop and Elasticsearch, respectively, and it's for the same reasons you should disable swapping for Kafka and ZooKeeper:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With