Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does it mean to 'flush to disk'?

Tags:

memory

caching

Could someone explain what is meant by flushing to disk in the following context? If I am writing data to a log on a filesystem, doesn't this mean I am putting it on disk? At what point would/should you flush a file to disk?

This suggests a design which is very simple: rather than maintain as much as possible in-memory and flush it all out to the filesystem in a panic when we run out of space, we invert that. All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.

(from https://kafka.apache.org/documentation.html#design).

like image 890
jcm Avatar asked Jan 28 '16 00:01

jcm


People also ask

What does flush mean in computer?

The transfer of data from memory (RAM) to storage. Whenever a document is saved, the program writes the contents of a reserved area of RAM (the buffer) to the hard disk or SSD. It flushes the buffer.

What does it mean to flush a file?

file. flush forces the data to be written out at that moment. This is hand when you know that it might be a while before you have more data to write out, but you want other processes to be able to view the data you've already written.

What does flushing the cache do?

Flush cache definition Cache flushing will clear that information in order to help smoothen and improve computer speed. In other words, everything including data and applications contained in that cache will be removed.

What is flush file system?

NAS Bridge caches the file system data, which can improve client and overall system performance. Flushing a file system causes NAS Bridge to immediately persist the cached data to the StorageGRID Webscale system.


1 Answers

All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.

What this means is that Kafka hands data off to the kernel with write() syscalls -- at which point in time it's visible to other processes but may or may not actually be reflected on disk and survive a reboot -- but doesn't force the kernel to rush it to disk with fsync() calls or similar (as appropriate for the OS at hand). If optimizing for throughput and not needing to guarantee that content is retrievable, this can be an appropriate decision: fsync() and its kin can be expensive calls (though by doing long contiguous writes that don't require seeking, kafka minimizes the expense of its disk IO).

like image 61
Charles Duffy Avatar answered Oct 11 '22 13:10

Charles Duffy