Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default buffer size for a file on Linux

Tags:

The documentation states that the default value for buffering is: If omitted, the system default is used. I am currently on Red Hat Linux 6, but I am not able to figure out the default buffering that is set for the system.

Can anyone please guide me as to how determine the buffering for a system?

like image 885
name_masked Avatar asked Aug 12 '13 18:08

name_masked


People also ask

What is buffer size in file?

Buffer size refers to the number of characters to save in memory before writing to the file. This process is called buffering . The underlying assumption is that a write to a file is much more slower that writing to memory.

What is socket buffer size in Linux?

default The default size of the send buffer for a TCP socket. This value overwrites the initial default buffer size from the generic global /proc/sys/net/core/wmem_default defined for all protocols. The default value is 16 kB.


1 Answers

Since you linked to the 2.7 docs, I'm assuming you're using 2.7. (In Python 3.x, this all gets a lot simpler, because a lot more of the buffering is exposed at the Python level.)

All open actually does (on POSIX systems) is call fopen, and then, if you've passed anything for buffering, setvbuf. Since you're not passing anything, you just end up with the default buffer from fopen, which is up to your C standard library. (See the source for details. With no buffering, it passes -1 to PyFile_SetBufSize, which does nothing unless bufsize >= 0.)

If you read the glibc setvbuf manpage, it explains that if you never call any of the buffering functions:

Normally all files are block buffered. When the first I/O operation occurs on a file, malloc(3) is called, and a buffer is obtained.

Note that it doesn't say what size buffer is obtained. This is intentional; it means the implementation can be smart and choose different buffer sizes for different cases. (There is a BUFSIZ constant, but that's only used when you call legacy functions like setbuf; it's not guaranteed to be used in any other case.)

So, what does happen? Well, if you look at the glibc source, ultimately it calls the macro _IO_DOALLOCATE, which can be hooked (or overridden, because glibc unifies C++ streambuf and C stdio buffering), but ultimately, it allocates a buf of _IO_BUFSIZE, which is an alias for the platform-specific macro _G_BUFSIZE, which is 8192.

Of course you probably want to trace down the macros on your own system rather than trust the generic source.


You may wonder why there is no good documented way to get this information. Presumably it's because you're not supposed to care. If you need a specific buffer size, you set one manually; if you trust that the system knows best, just trust it. Unless you're actually working on the kernel or libc, who cares? In theory, this also leaves open the possibility that the system could do something smart here, like picking a bufsize based on the block size for the file's filesystem, or even based on running stats data, although it doesn't look like linux/glibc, FreeBSD, or OS X do anything other than use a constant. And most likely that's because it really doesn't matter for most applications. (You might want to test that out yourself—use explicit buffer sizes ranging from 1KB to 2MB on some buffered-I/O-bound script and see what the performance differences are.)

like image 138
abarnert Avatar answered Oct 02 '22 14:10

abarnert