From a performance perspective, is it a good choice to run Kafka in Docker containers ? Are there things which one should watch out for, tune specifically etc. ?
You will need two Docker images to get Kafka running: wurstmeister/zookeeper. wurstmeister/kafka.
There is a good research paper from IBM on this topic - it is a bit dated by now, but I am sure the basic statements still hold true and have only been improved upon. The gist is, that the overhead introduced by Docker is quite small where it comes to cpu and memory, but for IO heavy applications you need to be a bit more careful. Depending on the workload I'd put Kafka squarely in the IO heavy group, so it is probably not a no-brainer. Kafka benefits a lot from fast disc access, so if you run your containers in some sort of distributed platform with storage attached on a SAN or NFS share or something like that I'd assume, that you will notice a difference. But if you only chose containers to ease deployment and run them on one physical machine, I'd assume the difference to be negligible.
But as with all performance questions, it is hard to say this in general, you'll have to test your specific use case and environment to be sure.
I believe the performance would largely be effected by the type of machine you use. Linkedin and other large users of Kafka often recommend using spinning disks rather than SSDs because of the predominantly linear reads and writes done along with the the use of IBM's Zerocopy in the Kafka protocol. On a machine hosting many containers, you'd lose all the advantages that spinning disks give Kafka.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With