I am using the low-level I/O function 'write' to write some data to disk in my code (C language on Linux). First, I accumulate the data in a memory buffer, and then I use 'write' to write the data to disk when the buffer is full. So what's the best buffer size for 'write'? According to my tests it isn't the bigger the faster, so I am here to look for the answer.
All that said, there's no “industry standard” buffer size and sample rate, as it's all dependent on your computer's processing power. However, recording at 128 to 256 at a sample rate of 48kHz is acceptable for most home recording on modern-day computers.
The upper limit for the maximum buffer size is 32768␠bytes (32␠KB).
The disk buffer is usually quite small, ranging between 8 and 256 MiB, and the page cache is generally all unused main memory.
The write is one of the most basic routines provided by a Unix-like operating system kernel. It writes data from a buffer declared by the user to a given device, such as a file. This is the primary way to output data from a program by directly using a system call.
There is probably some advantage in doing writes which are multiples of the filesystem block size, especially if you are updating a file in place. If you write less than a partial block to a file, the OS has to read the old block, combine in the new contents and then write it out. This doesn't necessarily happen if you rapidly write small pieces in sequence because the updates will be done on buffers in memory which are flushed later. Still, once in a while you could be triggering some inefficiency if you are not filling a block (and a properly aligned one: multiple of block size at an offset which is a multiple of the block size) with each write operation.
This issue of transfer size does not necessarily go away with mmap. If you map a file, and then memcpy
some data into the map, you are making a page dirty. That page has to be flushed at some later time: it is indeterminate when. If you make another memcpy
which touches the same page, that page could be clean now and you're making it dirty again. So it gets written twice. Page-aligned copies of multiples-of a page size will be the way to go.
You'll want it to be a multiple of the CPU page size, in order to use memory as efficiently as possible.
But ideally you want to use mmap instead, so that you never have to deal with buffers yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With