Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should reading negative into unsigned fail via std::cin (gcc, clang disagree)?

For example,

#include <iostream>

int main() {
  unsigned n{};
  std::cin >> n;
  std::cout << n << ' ' << (bool)std::cin << std::endl;
}

When input -1, clang 6.0.0 outputs 0 0 while gcc 7.2.0 outputs 4294967295 1. I'm wondering who is correct. Or maybe both are correct for the standard does not specify this? By fail, I take to mean (bool)std::cin be evaluated false. clang 6.0.0 fails input -0 too.


As of Clang 9.0.0 and GCC 9.2.0, both compilers, using either libstdc++ or libc++ in the case of Clang, agree on the result of the program above, independent of the C++ version (>= C++11) used, and print

4294967295 1

i.e. they set the value to ULLONG_MAX and do not set the failbit on the stream.

like image 504
Lingxi Avatar asked Apr 19 '18 12:04

Lingxi


2 Answers

I think that both are wrong in C++171 and that the expected output should be:

4294967295 0

While the returned value is correct for the latest versions of both compilers, I think that the ios_­base​::​failbit should be set, but I also think there is a confusion about the notion of field to be converted in the standard which may account for the current behaviors.

The standard says — [facet.num.get.virtuals#3.3]:

The sequence of chars accumulated in stage 2 (the field) is converted to a numeric value by the rules of one of the functions declared in the header <cstdlib>:

  • For a signed integer value, the function strtoll.

  • For an unsigned integer value, the function strtoull.

  • For a floating-point value, the function strtold.

So we fall back to std::strtoull, which must return2ULLONG_MAX and not set errno in this case (which is what both compilers do).

But in the same block (emphasis is mine):

The numeric value to be stored can be one of:

  • zero, if the conversion function does not convert the entire field.

  • the most positive (or negative) representable value, if the field to be converted to a signed integer type represents a value too large positive (or negative) to be represented in val.

  • the most positive representable value, if the field to be converted to an unsigned integer type represents a value that cannot be represented in val.

  • the converted value, otherwise.

The resultant numeric value is stored in val. If the conversion function does not convert the entire field, or if the field represents a value outside the range of representable values, ios_­base​::​failbit is assigned to err.

Notice that all these talks about the "field to be converted" and not the actual value returned by std::strtoull. The field here is actually the widened sequence of character '-', '1'.

Since the field represents a value (-1) that cannot be represented by an unsigned, the returned value should be UINT_MAX and the failbit should be set on std::cin.


1clang was actually right prior to C++17 because the third bullet in the above quote was:

- the most negative representable value or zero for an unsigned integer type, if the field represents a value too large negative to be represented in val. ios_base::failbit is assigned to err.

2std::strtoull returns ULLONG_MAX because (thanks @NathanOliver) — C/7.22.1.4.5:

If the subject sequence has the expected form and the value of base is zero, the sequence of characters starting with the first digit is interpreted as an integer constant according to the rules of 6.4.4.1. [...] If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).

like image 153
Holt Avatar answered Oct 14 '22 21:10

Holt


The question is about differences between the library implementations libc++ and libstdc++ - and not so much about differences between the compilers(clang, gcc).

cppreference clears these inconsistencies up pretty well:

The result of converting a negative number string into an unsigned integer was specified to produce zero until c++17, although some implementations followed the protocol of std::strtoull which negates in the target type, giving ULLONG_MAX for "-1", and so produce the largest value of the target type instead. As of c++17, strictly following std::strtoull is the correct behavior.

This summarises to:

  • ULLONG_MAX (4294967295) is correct going forward, since c++17 (both compilers do it correct now)
  • Previously it should have been 0 with a strict reading of the standard (libc++)
  • Some implementations (notably libstdc++) followed std::strtoull protocol instead (which now is considered the correct behavior)

The failbit set and why it was set, might be a more interesting question (at least from the language-lawyer perspective). In libc++ (clang) version 7 it now does the same as libstdc++ - this seems to suggest that it was chosen to be same as going forward (even though this goes against the letter of standard, that it should be zero before c++17) - but so far I've been unable to find changelog or documentation for this change.

The interesting block of text reads (assuming pre-c++17):

If the conversion function results in a negative value too large to fit in the type of v, the most negative representable value is stored in v, or zero for unsigned integer types.

According to this, the value is specified to be 0. Additionally, no where is it indicated that this should result in setting the failbit.

like image 2
darune Avatar answered Oct 14 '22 22:10

darune