Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where is prefix-dependent integer parsing defined?

I have a simple test program (error checks removed):

#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>

int main() {
    std::string line;
    while(std::cin >> line) {
        int value;
        std::stringstream stream(line);

        stream >> std::setbase(0) >> value;

        std::cout << "You typed: " << value << std::endl;
    }

}

Which works great for prefix-dependent integer parsing. It'll parse strings starting with "0x" or "0X" as hexadecimal and strings starting with '0' as octal. This is explained in several resources that I use and have seen. What I haven't been able to find though, is an indication in the C++ standard that this is guaranteed to work.

Section 7.20.1.4.3 on strtol in the C standard says (6.4.4.1 is the syntax for integer constants) I imagine the extraction operators use this under the hood:

If the value of base is zero, the expected form of the subject sequence is that of an integer constant as described in 6.4.4.1, optionally preceded by a plus or minus sign, but not including an integer suffix.

This works on the couple of versions of GCC that I've tried, but is it safe to use generally?

like image 843
Collin Avatar asked Nov 02 '12 13:11

Collin


2 Answers

setbase is defined in C++98 [lib.std.manip]/5, paraphrasing slightly

smanip setbase(int base);

Returns: An object s of unspecified type such that [inserting or extracting s from a stream behaves as if the following function were called on that stream:]

ios_base& f(ios_base& str, int base)
{
    str.setf(n == 8 ? ios_base::oct :
             n == 10 ? ios_base::dec :
             n == 16 ? ios_base::hex :
             ios_base::fmtflags(0), ios_base::basefield);
    return str;
}

Okay, so, if base is not 8, 10, or 16, then the basefield flags are cleared. The effect of a cleared basefield for input is defined in [lib.facet.num.get.virtuals], table 55 ("Integer conversions") as equivalent to sscanf("%i") on the sequence of characters next available.

C++98 refers to C89 for the definition of *scanf, naturally enough. I don't have a PDF copy of C89, but I do have C99, in which section 7.19.6.2 paragraph 12 [the C standard does not have the nice symbolic section names that the C++ standard has] defines "%i" to behave the same as strtol with base argument 0.

So the good news is, prefix-dependent integer scanning is guaranteed by the standard after setbase(0). The bad news is, iostream formatted input is defined in terms of *scanf, which means the dreadful sentence at the end of C99 7.19.6.2p10 applies:

If [the object that receives the result of scanning] does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.

(Emphasis mine.) Clearer version of that sentence: input overflow triggers undefined behavior. The C(++) runtime is allowed to crash the program if input to *scanf has too many digits! This is (one of several reasons) why I and others keep saying *scanf should never be used, and now I have to start saying it about istream >> int as well. :-(

The advice that holds for C is even easier to apply in C++: Read entire lines with std::getline and parse them by hand. Use the strtol family of functions to convert numeric input to machine numbers. (Those functions have predictable behavior on overflow.)

like image 61
zwol Avatar answered Oct 03 '22 02:10

zwol


§22.4.2.1.2/3, Table 85:

For conversion to an integral type, the function determines the integral conversion specifier as indicated in Table 85. The table is ordered. That is, the first line whose condition is true applies.

Table 85 — Integer conversions
State                    stdio equivalent
basefield == oct         %o
basefield == hex         %X
basefield == 0           %i
signed integral type     %d
unsigned integral type   %u

The %i conversion format for scanf and company does prefix-dependent conversion.

like image 45
Jerry Coffin Avatar answered Oct 03 '22 02:10

Jerry Coffin