Buffering of standard I/O library

Tags:

In the book Advanced Programming in the UNIX Environments (2nd edition), the author wrote in Section 5.5 (stream operations of the standard I/O library) that:

When a file is opened for reading and writing (the plus sign in the type), the following restrictions apply.

Output cannot be directly followed by input without an intervening fflush, fseek, fsetpos, or rewind.

Input cannot be directly followed by output without an intervening fseek, fsetpos, or rewind, or an input operation that encounters an end of file.

I got confused about this. Could anyone explain a little about this? For example, in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program? I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear.

305

asked Jan 16 '13 08:01

pjhades

2 Answers

You aren't allowed to intersperse input and output operations. For example, you can't use formatted input to seek to a particular point in the file, then start writing bytes starting at that point. This allows the implementation to assume that at any time, the sole I/O buffer will only contain either data to be read (to you) or written (to the OS), without doing any safety checks.

f = fopen( "myfile", "rw" ); /* open for read and write */
fscanf( f, "hello, world\n" ); /* scan past file header */
fprintf( f, "daturghhhf\n" ); /* write some data - illegal */

This is OK, though, if you do an fseek( f, 0, SEEK_CUR ); between the fscanf and the fprintf because that changes the mode of the I/O buffer without repositioning it.

Why is it done this way? As far as I can tell, because OS vendors often want to support automatic mode switching, but fail. The stdio spec allows a buggy implementation to be compliant, and a working implementation of automatic mode switching simply implements a compatible extension.

170

answered Oct 16 '22 14:10

Potatoswatter

It's not clear what you're asking.

Your basic question is "Why does the book say I can't do this?" Well, the book says you can't do it because the POSIX/SUS/etc. standard says it's undefined behavior in the fopen specification, which it does to align with the ISO C standard (N1124 working draft, because the final version is not free), 7.19.5.3.

Then you ask, "in what situation the input and output function calls violating the above restrictions will cause unexpected behavior of the program?"

Undefined behavior will always cause unexpected behavior, because the whole point is that you're not allowed to expect anything. (See 3.4.3 and 4 in the C standard linked above.)

But on top of that, it's not even clear what they could have specified that would make any sense. Look at this:

int main(int argc, char *argv[]) {
  FILE *fp = fopen("foo", "r+");
  fseek(fp, 0, SEEK_SET);
  fwrite("foo", 1, 3, fp);
  fseek(fp, 0, SEEK_SET);
  fwrite("bar", 1, 3, fp);
  char buf[4] = { 0 };
  size_t ret = fread(buf, 1, 3, fp);
  printf("%d %s\n", (int)ret, buf);
}

So, should this print out 3 foo because that's what's on disk, or 3 bar because that's what's in the "conceptual file", or 0 because there's nothing after what's been written so you're reading at EOF? And if you think there's an obvious answer, consider the fact that it's possible that bar has been flushed already—or even that it's been partially flushed, so the disk file now contains boo.

If you're asking the more practical question "Can I get away with it in some circumstances?", well, I believe on most Unix platforms, the above code will give you an occasional segfault, but 3 xyz (either 3 uninitialized characters, or in more complicated cases 3 characters that happened to be in the buffer before it got overwritten) the rest of the time. So, no, you can't get away with it.

Finally, you say, "I guess the reason for the restrictions may be related to the buffering in the library, but I'm not so clear." This sounds like you're asking about the rationale.

You're right that it's about buffering. As I pointed out above, there really is no intuitive right thing to do here—but also, think about the implementation. Remember that the Unix way has always been "if the simplest and most efficient code is good enough, do that".

There are three ways you could implement something like stdio:

Use a shared buffer for read and write, and write code to switch contexts as needed. This is going to be a bit complicated, and will flush buffers more often than you'd ideally like.
Use two separate buffers, and cache-style code to determine when one operation needs to copy from and/or invalidate the other buffer. This is even more complicated, and makes a FILE object take twice as much memory.
Use a shared buffer, and just don't allow interleaving reads and writes without explicit flushes in between. This is dead-simple, and as efficient as possible.
Use a shared buffer, and implicitly flush between interleaved reads and writes. This is almost as simple, and almost as efficient, and a lot safer, but not really any better in any way other than safety.

So, Unix went with #3, and documented it, and SUS, POSIX, C89, etc. standardized that behavior.

You might say, "Come on, it can't be that inefficient." Well, you have to remember that Unix was designed for low-end 1970s systems, and the basic philosophy that it's not worth trading off even a little efficiency unless there's some actual benefit. But, most importantly, consider that stdio has to handle trivial functions like getc and putc, not just fancy stuff like fscanf and fprintf, and adding anything to those functions (or macros) that makes them 5x as slow would make a huge difference in a lot of real-world code.

If you look at modern implementations from, e.g., *BSD, glibc, Darwin, MSVCRT, etc. (most of which are open source, or at least commercial-but-shared-source), most of them do things the same way. A few add safety checks, but they generally give you an error for interleaving rather than implicitly flushing—after all, if your code is wrong, it's better to tell you that your code is wrong than to try to DWIM.

For example, look at early Darwin (OS X) fopen, fread, and fwrite (chosen because it's nice and simple, and has easily-linkable code that's syntax-colored but also copy-pastable). All that fread has to do is copy bytes out of the buffer, and refill the buffer if it runs out. You can't get any simpler than that.

answered Oct 16 '22 13:10

abarnert

Related questions
                            
                                The address in Kernel
                            
                                How different static variable declarations in Objective-C?
                            
                                Using backspace with ncurses
                            
                                Why do C implicit conversions operate like they do?
                            
                                print bit type as integer? How is the conversion?
                            
                                Find rank of a decimal number based on function F( N ) = rank
                            
                                fork(): way for a child process to check if parent is dead?
                            
                                After suspending child process with SIGTSTP, shell not responding
                            
                                GDB: Assembly instruction calculation
                            
                                Java CAS operation performs faster than C equivalent, why?
                            
                                Convert int to 16bit float (half precision floating point) in c++
                            
                                C and Matlab: Why does this one line in Matlab become so many lines in C++ code generated by Matlab Coder?
                            
                                Is it dangerous to promote types in printf arguments?
                            
                                Time Slices in Round Robin Time Scheduling
                            
                                realloc without freeing old memory
                            
                                How can we specify physical address for variable?
                            
                                Static Analysis tool for Linux kernel modules and device drivers
                            
                                Generating all distinct partitions of a number
                            
                                Java JNI calls are slower than expected (at least 2 ms/call)
                            
                                Faster algorithm to find how many numbers are not divisible by a given set of numbers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Buffering of standard I/O library

Tags:

c

stdio

buffering

pjhades

People also ask

2 Answers

Potatoswatter

abarnert

Recent Activity

Donate For Us