Consider the following simple example
#include <string>
#include <sstream>
#include <iomanip>
using namespace std;
int main() {
string str = "string";
istringstream is(str);
is >> setw(6) >> str;
return is.eof();
}
At the first sight, since the explicit width is specified by the setw
manipulator, I'd expect the >>
operator to finish reading the string after successfully extracting the requested number of characters from the input stream. I don't see any immediate reason for it to try to extract the seventh character, which means that I don't expect the stream to enter eof
state.
When I run this example under MSVC++, it works as I expect it to: the stream remains in good state after reading. However, in GCC the behavior is different: the stream ends up in eof
state.
The language standard, it gives the following list of completion conditions for this version of >>
operator
- n characters are stored;
- end-of-file occurs on the input sequence;
- isspace(c,is.getloc()) is true for the next available input character c.
Given the above, I don't see any reason for the >>
operator to drive the stream into the eof
state in the above code.
However, this is what the >>
operator implementation in GCC library looks like
...
__int_type __c = __in.rdbuf()->sgetc();
while (__extracted < __n
&& !_Traits::eq_int_type(__c, __eof)
&& !__ct.is(__ctype_base::space,
_Traits::to_char_type(__c)))
{
if (__len == sizeof(__buf) / sizeof(_CharT))
{
__str.append(__buf, sizeof(__buf) / sizeof(_CharT));
__len = 0;
}
__buf[__len++] = _Traits::to_char_type(__c);
++__extracted;
__c = __in.rdbuf()->snextc();
}
__str.append(__buf, __len);
if (_Traits::eq_int_type(__c, __eof))
__err |= __ios_base::eofbit;
__in.width(0);
...
As you can see, at the end of each successful iteration, it attempts to prepare the next __c
character for the next iteration, even though the next iteration might never occur. And after the cycle it analyzes the last value of that __c
character and sets the eofbit
accordingly.
So, my question is: triggering the eof
stream state in the above situation, as GCC does - is it legal from the standard point of view? I don't see it explicitly specified in the document. Is both MSVC's and GCC's behavior compliant? Or is only one of them behaving correctly?
The definition for that particular operator>>
is not relevant to the setting of the eofbit
, as it only describes when the operation terminates, but not what triggers a particular bit.
The description for the eofbit
in the standard (draft) says:
eofbit - indicates that an input operation reached the end of an input sequence;
I guess here it depends on how you want to interpret "reached". Note that gcc implementation correctly does not set failbit
, which is defined as
failbit - indicates that an input operation failed to read the expected characters, or that an output operation failed to generate the desired characters.
So I think eofbit
does not necessarily mean that the end of file impeded the extractions of any new characters, just that the end of file has been "reached".
I can't seem to find a more accurate description for "reached", so I guess that would be implementation defined. If this logic is correct, then both MSVC and gcc behaviors are correct.
EDIT: In particular, it seems that eofbit
gets set when sgetc()
would return eof
. This is described both in the istreambuf_iterator
section and in the basic_istream::sentry
section. So now the question is: when is the current position of the stream allowed to advance?
FINAL EDIT: It turns out that probably g++ has the correct behavior.
Every character scan passes through <locale>
, in order to allow different character sets, money formats, time descriptions and number formats to be parsed. While there does not seem to be a through description on how the operator>>
works for strings, there are very specific descriptions on how do_get
functions for numbers, time and money are supposed to operate. You can find them from page 687 of the draft forward.
All of these start off by reading a ctype
(the "global" version of a character, as read through locales) from a istreambuf_iterator
(for numbers, you can find the call definitions at page 1018 of the draft). Then the ctype is processed, and finally the iterator is advanced.
So, in general, this requires the internal iterator to always point to the next character after the last one read; if that was not the case you could in theory extract more than you wanted:
string str = "strin1";
istringstream is(str);
is >> setw(6) >> str;
int x;
is >> x;
If the current character for is
after the extraction for str
was not on the eof
, then the standard would require that x
gets the value 1, since for numeric extraction the standard explicitly requires that the iterator is advanced after the first read.
Since this does not make much sense, and given that all complex extractions described in the standard behave in the same way, it makes sense that for strings the same would happen. Thus, as the pointer for is
after reading 6 characters falls on the eof
, the eofbit
needs to be set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With