Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading with setw: to eof or not to eof?

Consider the following simple example

#include <string>
#include <sstream>
#include <iomanip>

using namespace std;

int main() {
  string str = "string";
  istringstream is(str);
  is >> setw(6) >> str;
  return is.eof();
}

At the first sight, since the explicit width is specified by the setw manipulator, I'd expect the >> operator to finish reading the string after successfully extracting the requested number of characters from the input stream. I don't see any immediate reason for it to try to extract the seventh character, which means that I don't expect the stream to enter eof state.

When I run this example under MSVC++, it works as I expect it to: the stream remains in good state after reading. However, in GCC the behavior is different: the stream ends up in eof state.

The language standard, it gives the following list of completion conditions for this version of >> operator

  • n characters are stored;
  • end-of-file occurs on the input sequence;
  • isspace(c,is.getloc()) is true for the next available input character c.

Given the above, I don't see any reason for the >> operator to drive the stream into the eof state in the above code.

However, this is what the >> operator implementation in GCC library looks like

...
__int_type __c = __in.rdbuf()->sgetc();

while (__extracted < __n
       && !_Traits::eq_int_type(__c, __eof)
       && !__ct.is(__ctype_base::space,
                   _Traits::to_char_type(__c)))
{
  if (__len == sizeof(__buf) / sizeof(_CharT))
  {
    __str.append(__buf, sizeof(__buf) / sizeof(_CharT));
    __len = 0;
  }
  __buf[__len++] = _Traits::to_char_type(__c);
  ++__extracted;
  __c = __in.rdbuf()->snextc();
}
__str.append(__buf, __len);

if (_Traits::eq_int_type(__c, __eof))
  __err |= __ios_base::eofbit;
__in.width(0);
...

As you can see, at the end of each successful iteration, it attempts to prepare the next __c character for the next iteration, even though the next iteration might never occur. And after the cycle it analyzes the last value of that __c character and sets the eofbit accordingly.

So, my question is: triggering the eof stream state in the above situation, as GCC does - is it legal from the standard point of view? I don't see it explicitly specified in the document. Is both MSVC's and GCC's behavior compliant? Or is only one of them behaving correctly?

like image 931
AnT Avatar asked Oct 20 '14 18:10

AnT


1 Answers

The definition for that particular operator>> is not relevant to the setting of the eofbit, as it only describes when the operation terminates, but not what triggers a particular bit.

The description for the eofbit in the standard (draft) says:

eofbit - indicates that an input operation reached the end of an input sequence;

I guess here it depends on how you want to interpret "reached". Note that gcc implementation correctly does not set failbit, which is defined as

failbit - indicates that an input operation failed to read the expected characters, or that an output operation failed to generate the desired characters.

So I think eofbit does not necessarily mean that the end of file impeded the extractions of any new characters, just that the end of file has been "reached".

I can't seem to find a more accurate description for "reached", so I guess that would be implementation defined. If this logic is correct, then both MSVC and gcc behaviors are correct.


EDIT: In particular, it seems that eofbit gets set when sgetc() would return eof. This is described both in the istreambuf_iterator section and in the basic_istream::sentry section. So now the question is: when is the current position of the stream allowed to advance?


FINAL EDIT: It turns out that probably g++ has the correct behavior.

Every character scan passes through <locale>, in order to allow different character sets, money formats, time descriptions and number formats to be parsed. While there does not seem to be a through description on how the operator>> works for strings, there are very specific descriptions on how do_get functions for numbers, time and money are supposed to operate. You can find them from page 687 of the draft forward.

All of these start off by reading a ctype (the "global" version of a character, as read through locales) from a istreambuf_iterator (for numbers, you can find the call definitions at page 1018 of the draft). Then the ctype is processed, and finally the iterator is advanced.

So, in general, this requires the internal iterator to always point to the next character after the last one read; if that was not the case you could in theory extract more than you wanted:

string str = "strin1";
istringstream is(str);
is >> setw(6) >> str;
int x;
is >> x;

If the current character for is after the extraction for str was not on the eof, then the standard would require that x gets the value 1, since for numeric extraction the standard explicitly requires that the iterator is advanced after the first read.

Since this does not make much sense, and given that all complex extractions described in the standard behave in the same way, it makes sense that for strings the same would happen. Thus, as the pointer for is after reading 6 characters falls on the eof, the eofbit needs to be set.

like image 89
Svalorzen Avatar answered Sep 29 '22 23:09

Svalorzen