Let's say we have a stream containing simply:
hello
Note that there's no extra \n
at the end like there often is in a text file. Now, the following simple code shows that the eof
bit is set on the stream after extracting a single std::string
.
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&)
is defined as behaving like a formatted input function. This means it constructs a sentry
object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry
construction completes with no problems. When converted to a bool
, the sentry
object gives true
, so the extractor continues to get on with the actual extraction of the string.
The extraction is then defined as:
Characters are extracted and appended until any of the following occurs:
n
characters are stored;- end-of-file occurs on the input sequence;
isspace(c,is.getloc())
is true for the next available input character c.After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed. If the function extracts no characters, it calls
is.setstate(ios::failbit)
, which may throwios_base::failure
(27.5.5.4).
Nothing here actually causes the eof
bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof
bit should only be set if we do another ss >> result;
, because when the sentry
attempts to gobble up whitespace, the following situation will occur:
If
is.rdbuf()->sbumpc()
oris.rdbuf()->sgetc()
returnstraits::eof()
, the function callssetstate(failbit | eofbit)
However, this is definitely not happening yet because the failbit
isn't being set.
The consequence of the eof
bit being set is that the only reason the evil-idiom while (!stream.eof())
doesn't work when reading files is because of the extra \n
at the end and not because the eof
bit isn't yet set. My compiler is happily setting the eof
bit when the extraction stops at the end of file.
So should this be happening? Or did the standard mean to say that setstate(eofbit)
should occur?
To make it easier, the relevant sections of the standard are:
basic_istream::sentry
[istream::sentry]std::stringstream
is a basic_istream
and the operator>>
of std::string
"extracts" characters from it (as you found out).
27.7.2.1 Class template basic_istream
2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_- base::failure (27.5.5.4), before returning.
Also, "extracting" means calling these two functions.
3 Two groups of member function signatures share common properties: the formatted input functions (or extractors) and the unformatted input functions. Both groups of input functions are described as if they obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use other public members of istream.
So eof must be set.
Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.
I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With