Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does std::basic_istream::ignore() extract more characters than specified?

Tags:

c++

iostream

I have the following code:

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main(int argc, char* argv[]) {
    stringstream buffer("1234567890 ");
    cout << "pos-before: " << buffer.tellg() << endl;
    buffer.ignore(10, ' ');
    cout << "pos-after: " << buffer.tellg() << endl;
    cout << "eof: " << buffer.eof() << endl;
}

And it produces this output:

pos-before: 0
pos-after: 11
eof: 0

I would expect pos-after to be 10 and not 11. According to the specification, the ignore method should stop when any one of the following condition is set:

  1. count characters were extracted. This test is disabled in the special case when count equals std::numeric_limits<std::streamsize>::max()
  2. end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
  3. the next available character c in the input sequence is delim, as determined by Traits::eq_int_type(Traits::to_int_type(c), delim). The delimiter character is extracted and discarded. This test is disabled if delim is Traits::eof()

In this case I expect rule 1 to trigger before all the other rules and to stop when the stream position is 10.

Execution shows that it is not the case. What did I misunderstood ?

I also tried a variation of the code where I ignore only 9 characters. In this case the output is the expected one:

pos-before: 0
pos-after: 9
eof: 0

So it looks like in the case where ignore() extracted the count of characters, it still checks if the next character is the delimiter and if it is, it extracts it too. I can reproduce with g++ and clang++.

I also tried this variation of the code:

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main(int argc, char* argv[]) {
    cout << "--- 10x get\n";
    stringstream buffer("1234567890");
    cout << "pos-before: " << buffer.tellg() << '\n';
    for(int i=0; i<10; ++i)
        buffer.get();
    cout << "pos-after: " << buffer.tellg() << '\n';
    cout << "eof: " << buffer.eof() << '\n';
    
    cout << "--- ignore(10)\n";
    stringstream buffer2("1234567890");
    cout << "pos-before: " << buffer2.tellg() << '\n';
    buffer2.ignore(10);
    cout << "pos-after: " << buffer2.tellg() << '\n';
    cout << "eof: " << buffer2.eof() << '\n';
}

And the result is:

--- 10x get
pos-before: 0
pos-after: 10
eof: 0
--- ignore(10)
pos-before: 0
pos-after: -1
eof: 1

We see that using ignore() produces an end-of-file condition on the file. Indicating that ignore() did try to extract a character after having extracted 10 characters. But in this case, the 3rd condition is disabled and ignore() should not have tried to look at what the next character was.

like image 595
fjardon Avatar asked Oct 05 '20 07:10

fjardon


People also ask

What does ignore() do c++?

ignore() function is used which is used to ignore or clear one or more characters from the input buffer.

What library is ignore in C++?

C++ basic_ios Library - ignore.


Video Answer


2 Answers

The specification of std::basic_istream::ignore in [istream.unformatted] paragraph 25 is a bit unclear clear: it states "Characters are extracted until any of the following occurs:" without any indication of order. Paragraph 25.1 states that at most n characters are extracted (unless n is std::numeric_limits<std::streamsize>) and paragraph 25.3 states that the characters match. However, even if the conditions can be applied in any order, there is no conflict here: the nth character is not, yet, the expected character and ignore() is supposed to stop.

As was pointed out in a comment, there was/is a bug in libstdc++ which seems to be still present with the library shipping with gcc-10.2.0. Using clang++ with libc++ (if necessary, use -stdlib=libc++ when invoking clang++) doesn't show the same behavior.

As an aside: the unformatted input operations are setting a count of characters read which can be accessed using gcount(). Seeking within a stream is a rather way more expensive operation than accessing this count. Using gcount() also shows the problem (and speaking of expensive operations, I also replaced use of std::endl by using '\n'; see this video or this article for more details):

#include <iomanip>
#include <iostream>
#include <sstream>
#include <string>

int main() {
    std::istringstream buffer("1234567890 ");
    buffer.ignore(10, ' ');
    std::cout << "gcount: " << buffer.gcount() << '\n';
    std::cout << "eof: " << std::boolalpha << buffer.eof() << '\n';
}
like image 182
Dietmar Kühl Avatar answered Nov 14 '22 23:11

Dietmar Kühl


cppreference is notorious -- you should generally not rely on it for corner cases in the language, and refer to the spec instead, which says:

Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:

  • n != numeric_limits::max() (18.3.2) and n characters have been extracted so far
  • end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
  • traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).

Using "any of" here instead of "one of" makes it clear that ignore will stop if more than one of the conditions applies. That's essentiall the issue here -- both the first and thrid conditions apply, which brings up an underspecified corner case -- the third condition states that the next available character (that matches the delimiter) will also be extracted.

So this is exactly what the library is doing in this case -- the third condition applies, so it extracts the character. The fact that the first condition also applies is immaterial.

like image 27
Chris Dodd Avatar answered Nov 15 '22 00:11

Chris Dodd