Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the inconsistency of C++'s istream::eof() a bug in the spec or a bug in the implementation?

Tags:

c++

iostream

The following program demonstrates an inconsistency in the way that std::istream (specifically in my test code, std::istringstream) sets eof().

#include <sstream>
#include <cassert>

int main(int argc, const char * argv[])
{
    // EXHIBIT A:
    {
        // An empty stream doesn't recognize that it's empty...
        std::istringstream stream( "" );
        assert( !stream.eof() );        // (Not yet EOF. Maybe should be.)
        // ...until I read from it:
        const int c = stream.get();
        assert( c < 0 );                // (We received garbage.)
        assert( stream.eof() );         // (Now we're EOF.)
    }
    // THE MORAL: EOF only happens when actually attempting to read PAST the end of the stream.

    // EXHIBIT B:
    {
        // A stream that still has data beyond the current read position...
        std::istringstream stream( "c" );
        assert( !stream.eof() );        // (Clearly not yet EOF.)
        // ... clearly isn't eof(). But when I read the last character...
        const int c = stream.get();
        assert( c == 'c' );             // (We received something legit.)
        assert( !stream.eof() );        // (But we're already EOF?! THIS ASSERT FAILS.)
    }
    // THE MORAL: EOF happens when reading the character BEFORE the end of the stream.

    // Conclusion: MADNESS.
    return 0;
}

So, eof() "fires" when you read the character before the actual end-of-file. But if the stream is empty, it only fires when you actually attempt to read a character. Does eof() mean "you just tried to read off the end?" or "If you try to read again, you'll go off the end?" The answer is inconsistent.

Moreover, whether the assert fires or not depends on the compiler. Apple Clang 4.1, for example, fires the assertion (raises eof() when reading the preceding character). GCC 4.7.2, for example, does not.

This inconsistency makes it hard to write sensible loops that read through a stream but handle both empty and non-empty streams well.

OPTION 1:

while( stream && !stream.eof() )
{
    const int c = stream.get();    // BUG: Wrong if stream was empty before the loop.
    // ...
}

OPTION 2:

while( stream )
{
    const int c = stream.get();
    if( stream.eof() )
    {
        // BUG: Wrong when c in fact got the last character of the stream.
        break;
    }
    // ...
}

So, friends, how do I write a loop that parses through a stream, dealing with each character in turn, handles every character, but stops without fuss either when we hit the EOF, or in the case when the stream is empty to begin with, never starts?

And okay, the deeper question: I have the intuition that using peek() could maybe workaround this eof() inconsistency somehow, but...holy crap! Why the inconsistency?

like image 411
OldPeculier Avatar asked Nov 02 '12 23:11

OldPeculier


2 Answers

The eof() flag is only useful to determine if you hit end of file after some operation. The primary use is to avoid an error message if reading reasonably failed because there wasn't anything more to read. Trying to control a loop or something using eof() is bound to fail. In all cases you need to check after you tried to read if the read was successful. Before the attempt the stream can't know what you are going to read.

The semantics of eof() is defined thoroughly as "this flag gets set when reading the stream caused the stream buffer to return a failure". It isn't quite as easy to find this statement if I recall correct but this is what comes down. At some point the standard also says that the stream is allowed to read more than it has to in some situation which may cause eof() to be set when you don't necessarily expect it. One such example is reading a character: the stream may end up detecting that there is nothing following that character and set eof().

If you want to handle an empty stream, it's trivial: look at something from the stream and proceed only if you know it's not empty:

if (stream.peek() != std::char_traits<char>::eof()) {
    do_what_needs_to_be_done_for_a_non_empty_stream();
}
else {
    do_something_else();
}
like image 186
Dietmar Kühl Avatar answered Oct 12 '22 22:10

Dietmar Kühl


Never, ever check for eof alone.

The eof flag (which is the same as the eofbit bit flag in a value returned by rdstate()) is set when end-of-file is reached during an extract operation. If there were no extract operations, eofbit is never set, which is why your first check returns false.

However eofbit is no indication as to whether the operation was successful. For that, check failbit|badbit in rdstate(). failbit means "there was a logical error", and badbit means "there was an I/O error". Conveniently, there's a fail() function that returns exactly rdstate() & (failbit|badbit). Even more conveniently, there's an operator bool() function that returns !fail(). So you can do things like while(stream.read(buffer)){ ....

If the operation has failed, you may check eofbit, badbit and failbit separately to figure out why it has failed.

like image 42
n. 1.8e9-where's-my-share m. Avatar answered Oct 13 '22 00:10

n. 1.8e9-where's-my-share m.