Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does `ifstream::readsome` set `eofbit`?

Tags:

c++

iostream

This code loops forever:

#include <iostream>
#include <fstream>
#include <sstream>

int main(int argc, char *argv[])
{
    std::ifstream f(argv[1]);
    std::ostringstream ostr;

    while(f && !f.eof())
    {
        char b[5000];
        std::size_t read = f.readsome(b, sizeof b);
        std::cerr << "Read: " << read << " bytes" << std::endl;
        ostr.write(b, read);
    }
}

It's because readsome is never setting eofbit.

cplusplus.com says:

Errors are signaled by modifying the internal state flags:

eofbit The get pointer is at the end of the stream buffer's internal input array when the function is called, meaning that there are no positions to be read in the internal buffer (which may or not be the end of the input sequence). This happens when rdbuf()->in_avail() would return -1 before the first character is extracted.

failbit The stream was at the end of the source of characters before the function was called.

badbit An error other than the above happened.

Almost the same, the standard says:

[C++11: 27.7.2.3]: streamsize readsome(char_type* s, streamsize n);

32. Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, if !good() calls setstate(failbit) which may throw an exception, and return. Otherwise extracts characters and stores them into successive locations of an array whose first element is designated by s. If rdbuf()->in_avail() == -1, calls setstate(eofbit) (which may throw ios_base::failure (27.5.5.4)), and extracts no characters;

  • If rdbuf()->in_avail() == 0, extracts no characters
  • If rdbuf()->in_avail() > 0, extracts min(rdbuf()->in_avail(),n)).

33. Returns: The number of characters extracted.

That the in_avail() == 0 condition is a no-op implies that ifstream::readsome itself is a no-op if the stream buffer is empty, but the in_avail() == -1 condition implies that it will set eofbit when some other operation has led to in_avail() == -1.

This seems like an inconsistency, even despite the "some" nature of readsome.

So what are the semantics of readsome and eof? Have I interpreted them correctly? Are they an example of poor design in the streams library?


(Stolen from the [IMO] invalid libstdc++ bug 52169.)

like image 348
Lightness Races in Orbit Avatar asked Feb 08 '12 10:02

Lightness Races in Orbit


2 Answers

I think this is a customization point, not really used by the default stream implementations.

in_avail() returns the number of chars it can see in the internal buffer, if any. Otherwise it calls showmanyc() to try to detect if chars are known to be available elsewhere, so a buffer fill request is guaranteed to succeed.

In turn, showmanyc() will return the number of chars it knows about, if any, or -1 if it knows that a read will fail, or 0 if it doesn't have a clue.

The default implementation (basic_streambuf) always returns 0, so that is what you get unless you have a stream with some other streambuf overriding showmanyc.

Your loop is essentially read-as-many-chars-as-you-know-is-safe, and it gets stuck when that is zero (meaning "not sure").

like image 75
Bo Persson Avatar answered Sep 29 '22 10:09

Bo Persson


I don't think that readsome() is meant for what you're trying to do (read from a file on disk)... from cplusplus.com:

The function is intended to be used to read binary data from certain types of asynchronic sources that may wait for more characters, since it stops reading when the local buffer exhausts, avoiding potential unexpected delays.

So it sounds like readsome() is intended for streams from a network socket or something like that, and you probably want to just use read().

like image 43
bdow Avatar answered Sep 29 '22 10:09

bdow