I am having a really hard time understanding the depths of buffering especially in C programming and I have searched for really long on this topic but haven't found something satisfying till now. I will be a little more specific: I do understand the concept behind it (i.e. coordination of operations by different hardware devices and minimizing the difference in speed of these devices) but I would appreciate a more full explanation of these and other potential reasons for buffering (and by full I mean full the longer and deeper the better) it would also be really nice to give some concrete Examples of how buffering is implemented in I/O streams. The other questions would be that I noticed that some rules in buffer flushing aren't followed by my programs as weirdly as this sounds like the following simple fragment: <pre class="prettyprint"><code>#include <stdio.h> int main(void) { FILE * fp = fopen("hallo.txt", "w"); fputc('A', fp); getchar(); fputc('A', fp); getchar(); return 0; } </code></pre> The program is intended to demonstrate that impending input will flush arbitrary stream immediately when the first getchar() is called but this simply doesn't happen as often as I try it and with as many modifications as I want — it simply doesn't happen as for <code>stdout</code> (with <code>printf()</code> for example) the stream is flushed without any input requested also negating the rule therefore am I understanding this rule wrongly or is there something other to consider I am using Gnu GCC on Windows 8.1. Update: I forgot to ask that I read on some sites how people refer to e.g. string literals as buffers or even arrays as buffers; is this correct or am I missing something? Please explain this point too.

The word buffer is used for many different things in computer science. In the more general sense, it is any piece of memory where data is stored temporarily until it is processed or copied to the final destination (or other buffer). As you hinted in the question there are many types of buffers, but as a broad grouping: <ol> <li>Hardware buffers: These are buffers where data is stored before being moved to a HW device. Or buffers where data is stored while being received from the HW device until it is processed by the application. This is needed because the I/O operation usually has memory and timing requirements, and these are fulfilled by the buffer. Think of DMA devices that read/write directly to memory, if the memory is not set up properly the system may crash. Or sound devices that must have sub-microsecond precision or it will work poorly.</li> <li>Cache buffers: These are buffers where data is grouped before writing into/read from a file/device so that the performance is generally improved.</li> <li>Helper buffers: You move data into/from such a buffer, because it is easier for your algorithm.</li> </ol> Case #2 is that of your <code>FILE*</code> example. Imagine that a call to the write system call (<code>WriteFile()</code> in Win32) takes 1ms for just the call plus 1us for each byte (bear with me, things are more complicated in real world). Then, if you do: <pre class="prettyprint"><code>FILE *f = fopen("file.txt", "w"); for (int i=0; i < 1000000; ++i) fputc('x', f); fclose(f); </code></pre> Without buffering, this code would take <code>1000000 * (1ms + 1us)</code>, that's about 1000 seconds. However, with a buffer of 10000 bytes, there will be only 100 system calls, 10000 bytes each. That would be <code>100 * (1ms + 10000us)</code>. That's just 0.1 seconds! Note also that the OS will do its own buffering, so that the data is written to the actual device using the most efficient size. That will be a HW and cache buffer at the same time! About your problem with flushing, files are usually flushed just when closed or manually flushed. Some files, such as <code>stdout</code> are line-flushed, that is, they are flushed whenever a <code>'\n'</code> is written. Also the <code>stdin/stdout</code> are special: when you read from <code>stdin</code> then <code>stdout</code> is flushed. Other files are untouched, only <code>stdout</code>. That is handy if you are writing an interactive program. My case #3 is for example when you do: <pre class="prettyprint"><code>FILE *f = open("x.txt", "r"); char buffer[1000]; fgets(buffer, sizeof(buffer), f); int n; sscanf(buffer, "%d", &n); </code></pre> You use the buffer to hold a line from the file, and then you parse the data from the line. Yes, you could call <code>fscanf()</code> directly, but in other APIs there may not be the equivalent function, and moreover you have more control this way: you can analyze the type if line, skip comments, count lines... Or imagine that you receive one byte at a time, for example from a keyboard. You will just accumulate characters in a buffer and parse the line when the Enter key is pressed. That is what most interactive console programs do.

The noun "buffer" really refers to a usage, not a distinct thing. Any block of storage can serve as a buffer. The term is intentionally used in this general sense in conjunction with various I/O functions, though the docs for the C I/O stream functions tend to avoid that. Taking the POSIX <code>read()</code> function as an example, however: "read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf". The "buffer" in that case simply means the block of memory in which the bytes read will be recorded; it is ordinarily implemented as a <code>char[]</code> or a dynamically-allocated block. One uses a buffer especially in conjunction with I/O because some devices (especially hard disks) are most efficiently read in medium-to-large sized chunks, where as programs often want to consume that data in smaller pieces. Some other forms of I/O, such as network I/O, may inherently come in chunks, so that you must record each whole chunk (in a buffer) or else lose that part you're not immediately ready to consume. Similar considerations apply to output. As for your test program's behavior, the "rule" you hoped to demonstrate is specific to console I/O, but only one of the streams involved is connected to the console.

The first question is a bit too broad. Buffering is used in many cases, including message storage before actual usage, DMA uses, speedup usages and so on. In short, the entire buffering thing can be summarized as "save my data, let me continue execution while you do something with the data". Sometimes you may modify buffers after passing them to functions, sometimes not. Sometimes buffers are hardware, sometimes software. Sometimes they reside in RAM, sometimes in other memory types. So, please ask more specific question. As a point to begin, use wikipedia, it is almost always helpful: wiki As for the code sample, I haven't found any mention of all output buffers being flushed upon <code>getchar</code>. Buffers for files are generally flushed in three cases: <ol> <li> <code>fflush()</code> or equivalent</li> <li>File is closed</li> <li>The buffer is overflown.</li> </ol> Since neither of these cases is true for you, the file is not flushed (note that application termination is not in this list).

Understanding Buffering in C

Tags:

c

windows

gcc

I am having a really hard time understanding the depths of buffering especially in C programming and I have searched for really long on this topic but haven't found something satisfying till now.

I will be a little more specific: I do understand the concept behind it (i.e. coordination of operations by different hardware devices and minimizing the difference in speed of these devices) but I would appreciate a more full explanation of these and other potential reasons for buffering (and by full I mean full the longer and deeper the better) it would also be really nice to give some concrete Examples of how buffering is implemented in I/O streams.

The other questions would be that I noticed that some rules in buffer flushing aren't followed by my programs as weirdly as this sounds like the following simple fragment:

#include <stdio.h>

int main(void)
{
    FILE * fp = fopen("hallo.txt", "w");

    fputc('A', fp);
    getchar();
    fputc('A', fp);
    getchar();

    return 0;
}

The program is intended to demonstrate that impending input will flush arbitrary stream immediately when the first getchar() is called but this simply doesn't happen as often as I try it and with as many modifications as I want — it simply doesn't happen as for stdout (with printf() for example) the stream is flushed without any input requested also negating the rule therefore am I understanding this rule wrongly or is there something other to consider

I am using Gnu GCC on Windows 8.1.

Update:

I forgot to ask that I read on some sites how people refer to e.g. string literals as buffers or even arrays as buffers; is this correct or am I missing something? Please explain this point too.

472

asked Jan 16 '15 22:01

Lockon2000

3 Answers

The word buffer is used for many different things in computer science. In the more general sense, it is any piece of memory where data is stored temporarily until it is processed or copied to the final destination (or other buffer).

As you hinted in the question there are many types of buffers, but as a broad grouping:

Hardware buffers: These are buffers where data is stored before being moved to a HW device. Or buffers where data is stored while being received from the HW device until it is processed by the application. This is needed because the I/O operation usually has memory and timing requirements, and these are fulfilled by the buffer. Think of DMA devices that read/write directly to memory, if the memory is not set up properly the system may crash. Or sound devices that must have sub-microsecond precision or it will work poorly.
Cache buffers: These are buffers where data is grouped before writing into/read from a file/device so that the performance is generally improved.
Helper buffers: You move data into/from such a buffer, because it is easier for your algorithm.

Case #2 is that of your FILE* example. Imagine that a call to the write system call (WriteFile() in Win32) takes 1ms for just the call plus 1us for each byte (bear with me, things are more complicated in real world). Then, if you do:

FILE *f = fopen("file.txt", "w");
for (int i=0; i < 1000000; ++i)
    fputc('x', f);
fclose(f);

Without buffering, this code would take 1000000 * (1ms + 1us), that's about 1000 seconds. However, with a buffer of 10000 bytes, there will be only 100 system calls, 10000 bytes each. That would be 100 * (1ms + 10000us). That's just 0.1 seconds!

Note also that the OS will do its own buffering, so that the data is written to the actual device using the most efficient size. That will be a HW and cache buffer at the same time!

About your problem with flushing, files are usually flushed just when closed or manually flushed. Some files, such as stdout are line-flushed, that is, they are flushed whenever a '\n' is written. Also the stdin/stdout are special: when you read from stdin then stdout is flushed. Other files are untouched, only stdout. That is handy if you are writing an interactive program.

My case #3 is for example when you do:

FILE *f = open("x.txt", "r");
char buffer[1000];
fgets(buffer, sizeof(buffer), f);
int n;
sscanf(buffer, "%d", &n);

You use the buffer to hold a line from the file, and then you parse the data from the line. Yes, you could call fscanf() directly, but in other APIs there may not be the equivalent function, and moreover you have more control this way: you can analyze the type if line, skip comments, count lines...

Or imagine that you receive one byte at a time, for example from a keyboard. You will just accumulate characters in a buffer and parse the line when the Enter key is pressed. That is what most interactive console programs do.

answered Oct 28 '22 10:10

rodrigo

The noun "buffer" really refers to a usage, not a distinct thing. Any block of storage can serve as a buffer. The term is intentionally used in this general sense in conjunction with various I/O functions, though the docs for the C I/O stream functions tend to avoid that. Taking the POSIX read() function as an example, however: "read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf". The "buffer" in that case simply means the block of memory in which the bytes read will be recorded; it is ordinarily implemented as a char[] or a dynamically-allocated block.

One uses a buffer especially in conjunction with I/O because some devices (especially hard disks) are most efficiently read in medium-to-large sized chunks, where as programs often want to consume that data in smaller pieces. Some other forms of I/O, such as network I/O, may inherently come in chunks, so that you must record each whole chunk (in a buffer) or else lose that part you're not immediately ready to consume. Similar considerations apply to output.

As for your test program's behavior, the "rule" you hoped to demonstrate is specific to console I/O, but only one of the streams involved is connected to the console.

answered Oct 28 '22 12:10

John Bollinger

The first question is a bit too broad. Buffering is used in many cases, including message storage before actual usage, DMA uses, speedup usages and so on. In short, the entire buffering thing can be summarized as "save my data, let me continue execution while you do something with the data".

Sometimes you may modify buffers after passing them to functions, sometimes not. Sometimes buffers are hardware, sometimes software. Sometimes they reside in RAM, sometimes in other memory types.

So, please ask more specific question. As a point to begin, use wikipedia, it is almost always helpful: wiki

As for the code sample, I haven't found any mention of all output buffers being flushed upon getchar. Buffers for files are generally flushed in three cases:

fflush() or equivalent
File is closed
The buffer is overflown.

Since neither of these cases is true for you, the file is not flushed (note that application termination is not in this list).

answered Oct 28 '22 10:10

Aneri

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding Buffering in C

Tags:

c

windows

gcc

Lockon2000

People also ask

3 Answers

rodrigo

John Bollinger

Aneri

Recent Activity

Donate For Us

Understanding Buffering in C

Tags:

c

windows

gcc

Lockon2000

People also ask

3 Answers

rodrigo

John Bollinger

Aneri

Related questions

Recent Activity

Donate For Us