Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is failbit set when eof is found on read?

Tags:

c++

eof

fstream

I've read that <fstream> predates <exception>. Ignoring the fact that exceptions on fstream aren't very informative, I have the following question:

It's possible to enable exceptions on file streams using the exceptions() method.

ifstream stream;
stream.exceptions(ifstream::failbit | ifstream::badbit);
stream.open(filename.c_str(), ios::binary);

Any attempt to open a nonexistent file, a file without the correct permissions, or any other I/O problem will results in exception. This is very good using an assertive programming style. The file was supposed to be there and be readable. If the conditions aren't met, we get an exception. If I wasn't sure whether the file could safely be opened, I could use other functions to test for it.

But now suppose I try to read into a buffer, like this:

char buffer[10];
stream.read(buffer, sizeof(buffer)); 

If the stream detects the end-of-file before filling the buffer, the stream decides to set the failbit, and an exception is fired if they were enabled. Why? What's the point of this? I could have verified that just testing eof() after the read:

char buffer[10];
stream.read(buffer, sizeof(buffer));
if (stream.eof()) // or stream.gcount() != sizeof(buffer)
    // handle eof myself

This design choice prevents me from using standard exceptions on streams and forces me to create my own exception handling on permissions or I/O errors. Or am I missing something? Is there any way out? For example, can I easily test if I can read sizeof(buffer) bytes on the stream before doing so?

like image 802
ceztko Avatar asked Jul 21 '11 19:07

ceztko


3 Answers

The failbit is designed to allow the stream to report that some operation failed to complete successfully. This includes errors such as failing to open the file, trying to read data that doesn't exist, and trying to read data of the wrong type.

The particular case you're asking about is reprinted here:

char buffer[10];
stream.read(buffer, sizeof(buffer)); 

Your question is why failbit is set when the end-of-file is reached before all of the input is read. The reason is that this means that the read operation failed - you asked to read 10 characters, but there weren't sufficiently many characters in the file. Consequently, the operation did not complete successfully, and the stream signals failbit to let you know this, even though the available characters will be read.

If you want to do a read operation where you want to read up to some number of characters, you can use the readsome member function:

char buffer[10];
streamsize numRead = stream.readsome(buffer, sizeof(buffer)); 

This function will read characters up to the end of the file, but unlike read it doesn't set failbit if the end of the file is reached before the characters are read. In other words, it says "try to read this many characters, but it's not an error if you can't. Just let me know how much you read." This contrasts with read, which says "I want precisely this many characters, and it's an error if you can't do it."

EDIT: An important detail I forgot to mention is that eofbit can be set without triggering failbit. For example, suppose that I have a text file that contains the text

137

without any newlines or trailing whitespace afterwards. If I write this code:

ifstream input("myfile.txt");

int value;
input >> value;

Then at this point input.eof() will return true, because when reading the characters from the file the stream hit the end of the file trying to see if there were any other characters in the stream. However, input.fail() will not return true, because the operation succeeded - we can indeed read an integer from the file.

Hope this helps!

like image 163
templatetypedef Avatar answered Nov 14 '22 17:11

templatetypedef


Using the underlying buffer directly seems to do the trick:

char buffer[10];
streamsize num_read = stream.rdbuf()->sgetn(buffer, sizeof(buffer));
like image 41
absence Avatar answered Nov 14 '22 16:11

absence


Improving @absence's answer, it follows a method readeof() that does the same of read() but doesn't set failbit on EOF. Also real read failures have been tested, like an interrupted transfer by hard removal of a USB stick or link drop in a network share access. It has been tested on Windows 7 with VS2010 and VS2013 and on linux with gcc 4.8.1. On linux only USB stick removal has been tried.

#include <iostream>
#include <fstream>
#include <stdexcept>

using namespace std;

streamsize readeof(istream &stream, char *buffer, streamsize count)
{
    if (count == 0 || stream.eof())
        return 0;

    streamsize offset = 0;
    streamsize reads;
    do
    {
        // This consistently fails on gcc (linux) 4.8.1 with failbit set on read
        // failure. This apparently never fails on VS2010 and VS2013 (Windows 7)
        reads = stream.rdbuf()->sgetn(buffer + offset, count);

        // This rarely sets failbit on VS2010 and VS2013 (Windows 7) on read
        // failure of the previous sgetn()
        (void)stream.rdstate();

        // On gcc (linux) 4.8.1 and VS2010/VS2013 (Windows 7) this consistently
        // sets eofbit when stream is EOF for the conseguences  of sgetn(). It
        // should also throw if exceptions are set, or return on the contrary,
        // and previous rdstate() restored a failbit on Windows. On Windows most
        // of the times it sets eofbit even on real read failure
        (void)stream.peek();

        if (stream.fail())
            throw runtime_error("Stream I/O error while reading");

        offset += reads;
        count -= reads;
    } while (count != 0 && !stream.eof());

    return offset;
}

#define BIGGER_BUFFER_SIZE 200000000

int main(int argc, char* argv[])
{
    ifstream stream;
    stream.exceptions(ifstream::badbit | ifstream::failbit);
    stream.open("<big file on usb stick>", ios::binary);

    char *buffer = new char[BIGGER_BUFFER_SIZE];

    streamsize reads = readeof(stream, buffer, BIGGER_BUFFER_SIZE);

    if (stream.eof())
        cout << "eof" << endl << flush;

    delete buffer;

    return 0;
}

Bottom line: on linux the behavior is more consistent and meaningful. With exceptions enabled on real read failures it will throw on sgetn(). On the contrary Windows will treat read failures as EOF most of the times.

like image 1
ceztko Avatar answered Nov 14 '22 16:11

ceztko