Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does string extraction from a stream set the eof bit?

Let's say we have a stream containing simply:

hello

Note that there's no extra \n at the end like there often is in a text file. Now, the following simple code shows that the eof bit is set on the stream after extracting a single std::string.

int main(int argc, const char* argv[])
{
  std::stringstream ss("hello");
  std::string result;
  ss >> result;
  std::cout << ss.eof() << std::endl; // Outputs 1
  return 0;
}

However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&) is defined as behaving like a formatted input function. This means it constructs a sentry object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry construction completes with no problems. When converted to a bool, the sentry object gives true, so the extractor continues to get on with the actual extraction of the string.

The extraction is then defined as:

Characters are extracted and appended until any of the following occurs:

  • n characters are stored;
  • end-of-file occurs on the input sequence;
  • isspace(c,is.getloc()) is true for the next available input character c.

After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed. If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.5.5.4).

Nothing here actually causes the eof bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof bit should only be set if we do another ss >> result;, because when the sentry attempts to gobble up whitespace, the following situation will occur:

If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(failbit | eofbit)

However, this is definitely not happening yet because the failbit isn't being set.

The consequence of the eof bit being set is that the only reason the evil-idiom while (!stream.eof()) doesn't work when reading files is because of the extra \n at the end and not because the eof bit isn't yet set. My compiler is happily setting the eof bit when the extraction stops at the end of file.

So should this be happening? Or did the standard mean to say that setstate(eofbit) should occur?


To make it easier, the relevant sections of the standard are:

  • 21.4.8.9 Inserters and extractors [string.io]
  • 27.7.2.2 Formatted input functions [istream.formatted]
  • 27.7.2.1.3 Class basic_istream::sentry [istream::sentry]
like image 996
Joseph Mansfield Avatar asked Jan 29 '13 20:01

Joseph Mansfield


2 Answers

std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).

27.7.2.1 Class template basic_istream

2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_- base::failure (27.5.5.4), before returning.

Also, "extracting" means calling these two functions.

3 Two groups of member function signatures share common properties: the formatted input functions (or extractors) and the unformatted input functions. Both groups of input functions are described as if they obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use other public members of istream.

So eof must be set.

like image 133
ipc Avatar answered Nov 11 '22 23:11

ipc


Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.

I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.

like image 35
templatetypedef Avatar answered Nov 12 '22 00:11

templatetypedef