I’m looking at Kafka documentation, in particular at Persistence section:
kafka doc - persistence section
If I understood in the last lines it says that Kafka writes data on disks as they arrive instead of use RAM. It sounds really strange to me (writes on disks are not heavy operations?) but clearly I trust kafka developers. First of all I would like to have a confirm of that.
Then, assuming it and to verify it I executed a simple task with a data stream of 500kb/s for some minutes on a machine with 4GB-200GB and I produced graphs of ram memory usage(%) and disk space usage (MB). You can find a pic here:
RAM : https://ibb.co/mzYD5m
DISK SPACE: https://ibb.co/coAMrR
(The stream is ingested at second 125 and finish at second around 870)
Accordingly to what I understood, I expected to see a linear decreasing graph (due to the gradually occupation of space as data arrive) about disk space usage, instead I’m not able to explain why are showed those plain regions which indicate that no other space is occupied in the correspondent seconds.
Moreover, continuing in the doc, there is the section:
linux flush behaviour
which seems to explain an opposite behaviour respect to the "Persistence" section. It said Linux use a pagecache (stored in the RAM I suppose) to provide a disk cache. This could explain the presence of the plain regions in the second graph but it goes against the principle of Kafka of avoid writes on volatile memory.
I'm really confused.
Thank you, Andrea
Kafka always writes directly to disk, but remember one thing the I/O operations are really carried out by the Operating System. In case of Linux it seems the data is written to the page cache until it can be written to the disk. Kafka has done its job of assigning the operating system the data to be written to the disk, but it it is the operating system which decides when and how to write the data. Hope that answers your question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With