What is wrong with using <code>feof()</code> to control a read loop? For example: <pre class="prettyprint"><code>#include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { char *path = "stdin"; FILE *fp = argc > 1 ? fopen(path=argv[1], "r") : stdin; if( fp == NULL ){ perror(path); return EXIT_FAILURE; } while( !feof(fp) ){ /* THIS IS WRONG */ /* Read and process data from file… */ } if( fclose(fp) != 0 ){ perror(path); return EXIT_FAILURE; } return EXIT_SUCCESS; } </code></pre> What is wrong with this loop?

<h3>TL;DR</h3> <code>while(!feof)</code> is wrong because it tests for something that is irrelevant and fails to test for something that you need to know. The result is that you are erroneously executing code that assumes that it is accessing data that was read successfully, when in fact this never happened. I'd like to provide an abstract, high-level perspective. So continue reading if you're interested in what <code>while(!feof)</code> actually does. <h3>Concurrency and simultaneity</h3> I/O operations interact with the environment. The environment is not part of your program, and not under your control. The environment truly exists "concurrently" with your program. As with all things concurrent, questions about the "current state" don't make sense: There is no concept of "simultaneity" across concurrent events. Many properties of state simply don't exist concurrently. Let me make this more precise: Suppose you want to ask, "do you have more data". You could ask this of a concurrent container, or of your I/O system. But the answer is generally unactionable, and thus meaningless. So what if the container says "yes" – by the time you try reading, it may no longer have data. Similarly, if the answer is "no", by the time you try reading, data may have arrived. The conclusion is that there simply is no property like "I have data", since you cannot act meaningfully in response to any possible answer. (The situation is slightly better with buffered input, where you might conceivably get a "yes, I have data" that constitutes some kind of guarantee, but you would still have to be able to deal with the opposite case. And with output the situation is certainly just as bad as I described: you never know if that disk or that network buffer is full.) So we conclude that it is impossible, and in fact unreasonable, to ask an I/O system whether it will be able to perform an I/O operation. The only possible way we can interact with it (just as with a concurrent container) is to attempt the operation and check whether it succeeded or failed. At that moment where you interact with the environment, then and only then can you know whether the interaction was actually possible, and at that point you must commit to performing the interaction. (This is a "synchronisation point", if you will.) <h3>EOF</h3> Now we get to EOF. EOF is the response you get from an attempted I/O operation. It means that you were trying to read or write something, but when doing so you failed to read or write any data, and instead the end of the input or output was encountered. This is true for essentially all the I/O APIs, whether it be the C standard library, C++ iostreams, or other libraries. As long as the I/O operations succeed, you simply cannot know whether further, future operations will succeed. You must always first try the operation and then respond to success or failure. <h3>Examples</h3> In each of the examples, note carefully that we first attempt the I/O operation and then consume the result if it is valid. Note further that we always must use the result of the I/O operation, though the result takes different shapes and forms in each example. <ul> <li> C stdio, read from a file: <pre class="prettyprint"><code> for (;;) { size_t n = fread(buf, 1, bufsize, infile); consume(buf, n); if (n == 0) { break; } } </code></pre> The result we must use is <code>n</code>, the number of elements that were read (which may be as little as zero). </li> <li> C stdio, <code>scanf</code>: <pre class="prettyprint"><code> for (int a, b, c; scanf("%d %d %d", &a, &b, &c) == 3; ) { consume(a, b, c); } </code></pre> The result we must use is the return value of <code>scanf</code>, the number of elements converted. </li> <li> C++, iostreams formatted extraction: <pre class="prettyprint"><code> for (int n; std::cin >> n; ) { consume(n); } </code></pre> The result we must use is <code>std::cin</code> itself, which can be evaluated in a boolean context and tells us whether the stream is still in the <code>good()</code> state. </li> <li> C++, iostreams getline: <pre class="prettyprint"><code> for (std::string line; std::getline(std::cin, line); ) { consume(line); } </code></pre> The result we must use is again <code>std::cin</code>, just as before. </li> <li> POSIX, <code>write(2)</code> to flush a buffer: <pre class="prettyprint"><code> char const * p = buf; ssize_t n = bufsize; for (ssize_t k = bufsize; (k = write(fd, p, n)) > 0; p += k, n -= k) {} if (n != 0) { /* error, failed to write complete buffer */ } </code></pre> The result we use here is <code>k</code>, the number of bytes written. The point here is that we can only know how many bytes were written after the write operation. </li> <li> POSIX <code>getline()</code> <pre class="prettyprint"><code> char *buffer = NULL; size_t bufsiz = 0; ssize_t nbytes; while ((nbytes = getline(&buffer, &bufsiz, fp)) != -1) { /* Use nbytes of data in buffer */ } free(buffer); </code></pre> The result we must use is <code>nbytes</code>, the number of bytes up to and including the newline (or EOF if the file did not end with a newline). Note that the function explicitly returns <code>-1</code> (and not EOF!) when an error occurs or it reaches EOF. </li> </ul> You may notice that we very rarely spell out the actual word "EOF". We usually detect the error condition in some other way that is more immediately interesting to us (e.g. failure to perform as much I/O as we had desired). In every example there is some API feature that could tell us explicitly that the EOF state has been encountered, but this is in fact not a terribly useful piece of information. It is much more of a detail than we often care about. What matters is whether the I/O succeeded, more-so than how it failed. <ul> <li> A final example that actually queries the EOF state: Suppose you have a string and want to test that it represents an integer in its entirety, with no extra bits at the end except whitespace. Using C++ iostreams, it goes like this: <pre class="prettyprint"><code> std::string input = " 123 "; // example std::istringstream iss(input); int value; if (iss >> value >> std::ws && iss.get() == EOF) { consume(value); } else { // error, "input" is not parsable as an integer } </code></pre> </li> </ul> We use two results here. The first is <code>iss</code>, the stream object itself, to check that the formatted extraction to <code>value</code> succeeded. But then, after also consuming whitespace, we perform another I/O/ operation, <code>iss.get()</code>, and expect it to fail as EOF, which is the case if the entire string has already been consumed by the formatted extraction. In the C standard library you can achieve something similar with the <code>strto*l</code> functions by checking that the end pointer has reached the end of the input string.

Why is “while ( !feof (file) )” always wrong?

Tags:

c

file

while-loop

eof

feof

What is wrong with using feof() to control a read loop? For example:

#include <stdio.h> #include <stdlib.h>  int main(int argc, char **argv) {     char *path = "stdin";     FILE *fp = argc > 1 ? fopen(path=argv[1], "r") : stdin;      if( fp == NULL ){         perror(path);         return EXIT_FAILURE;     }      while( !feof(fp) ){  /* THIS IS WRONG */         /* Read and process data from file… */     }     if( fclose(fp) != 0 ){         perror(path);         return EXIT_FAILURE;     }     return EXIT_SUCCESS; }

What is wrong with this loop?

270

asked Mar 25 '11 11:03

William Pursell

1 Answers

TL;DR

while(!feof) is wrong because it tests for something that is irrelevant and fails to test for something that you need to know. The result is that you are erroneously executing code that assumes that it is accessing data that was read successfully, when in fact this never happened.

I'd like to provide an abstract, high-level perspective. So continue reading if you're interested in what while(!feof) actually does.

Concurrency and simultaneity

I/O operations interact with the environment. The environment is not part of your program, and not under your control. The environment truly exists "concurrently" with your program. As with all things concurrent, questions about the "current state" don't make sense: There is no concept of "simultaneity" across concurrent events. Many properties of state simply don't exist concurrently.

Let me make this more precise: Suppose you want to ask, "do you have more data". You could ask this of a concurrent container, or of your I/O system. But the answer is generally unactionable, and thus meaningless. So what if the container says "yes" – by the time you try reading, it may no longer have data. Similarly, if the answer is "no", by the time you try reading, data may have arrived. The conclusion is that there simply is no property like "I have data", since you cannot act meaningfully in response to any possible answer. (The situation is slightly better with buffered input, where you might conceivably get a "yes, I have data" that constitutes some kind of guarantee, but you would still have to be able to deal with the opposite case. And with output the situation is certainly just as bad as I described: you never know if that disk or that network buffer is full.)

So we conclude that it is impossible, and in fact unreasonable, to ask an I/O system whether it will be able to perform an I/O operation. The only possible way we can interact with it (just as with a concurrent container) is to attempt the operation and check whether it succeeded or failed. At that moment where you interact with the environment, then and only then can you know whether the interaction was actually possible, and at that point you must commit to performing the interaction. (This is a "synchronisation point", if you will.)

EOF

Now we get to EOF. EOF is the response you get from an attempted I/O operation. It means that you were trying to read or write something, but when doing so you failed to read or write any data, and instead the end of the input or output was encountered. This is true for essentially all the I/O APIs, whether it be the C standard library, C++ iostreams, or other libraries. As long as the I/O operations succeed, you simply cannot know whether further, future operations will succeed. You must always first try the operation and then respond to success or failure.

Examples

In each of the examples, note carefully that we first attempt the I/O operation and then consume the result if it is valid. Note further that we always must use the result of the I/O operation, though the result takes different shapes and forms in each example.

C stdio, read from a file:
```
  for (;;) {       size_t n = fread(buf, 1, bufsize, infile);       consume(buf, n);       if (n == 0) { break; }   } 
```
The result we must use is n, the number of elements that were read (which may be as little as zero).
C stdio, scanf:
```
  for (int a, b, c; scanf("%d %d %d", &a, &b, &c) == 3; ) {       consume(a, b, c);   } 
```
The result we must use is the return value of scanf, the number of elements converted.
C++, iostreams formatted extraction:
```
  for (int n; std::cin >> n; ) {       consume(n);   } 
```
The result we must use is std::cin itself, which can be evaluated in a boolean context and tells us whether the stream is still in the good() state.

C++, iostreams getline:

  for (std::string line; std::getline(std::cin, line); ) {       consume(line);   }

The result we must use is again std::cin, just as before.

POSIX, write(2) to flush a buffer:

  char const * p = buf;   ssize_t n = bufsize;   for (ssize_t k = bufsize; (k = write(fd, p, n)) > 0; p += k, n -= k) {}   if (n != 0) { /* error, failed to write complete buffer */ }

The result we use here is k, the number of bytes written. The point here is that we can only know how many bytes were written after the write operation.

POSIX getline()
```
  char *buffer = NULL;   size_t bufsiz = 0;   ssize_t nbytes;   while ((nbytes = getline(&buffer, &bufsiz, fp)) != -1)   {       /* Use nbytes of data in buffer */   }   free(buffer); 
```
The result we must use is nbytes, the number of bytes up to and including the newline (or EOF if the file did not end with a newline).

Note that the function explicitly returns -1 (and not EOF!) when an error occurs or it reaches EOF.

You may notice that we very rarely spell out the actual word "EOF". We usually detect the error condition in some other way that is more immediately interesting to us (e.g. failure to perform as much I/O as we had desired). In every example there is some API feature that could tell us explicitly that the EOF state has been encountered, but this is in fact not a terribly useful piece of information. It is much more of a detail than we often care about. What matters is whether the I/O succeeded, more-so than how it failed.

A final example that actually queries the EOF state: Suppose you have a string and want to test that it represents an integer in its entirety, with no extra bits at the end except whitespace. Using C++ iostreams, it goes like this:

  std::string input = "   123   ";   // example    std::istringstream iss(input);   int value;   if (iss >> value >> std::ws && iss.get() == EOF) {       consume(value);   } else {       // error, "input" is not parsable as an integer   }

We use two results here. The first is iss, the stream object itself, to check that the formatted extraction to value succeeded. But then, after also consuming whitespace, we perform another I/O/ operation, iss.get(), and expect it to fail as EOF, which is the case if the entire string has already been consumed by the formatted extraction.

In the C standard library you can achieve something similar with the strto*l functions by checking that the end pointer has reached the end of the input string.

answered Oct 17 '22 09:10

Kerrek SB

Related questions
                            
                                Programmatically find the number of cores on a machine
                            
                                Why does sizeof(x++) not increment x?
                            
                                unsigned int vs. size_t
                            
                                How to initialize a struct in accordance with C programming language standards
                            
                                How do I list the symbols in a .so file
                            
                                Correct format specifier for double in printf
                            
                                What is the difference between char s[] and char *s?
                            
                                Fastest way to check if a file exist using standard C++/C++11,14,17/C?
                            
                                What is the argument for printf that formats a long?
                            
                                Calling C/C++ from Python? [closed]
                            
                                What is the printf format specifier for bool?
                            
                                What is a "static" function in C?
                            
                                Undefined, unspecified and implementation-defined behavior
                            
                                What REALLY happens when you don't free after malloc?
                            
                                Which is faster: while(1) or while(2)?
                            
                                Why does printf not flush after the call unless a newline is in the format string?
                            
                                Why are #ifndef and #define used in C++ header files?
                            
                                Difference between static and shared libraries?
                            
                                What does "dereferencing" a pointer mean?
                            
                                Why does ENOENT mean "No such file or directory"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With