Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++11 std::stoi silently fails when base not in [2,36] (GCC)

Tags:

c++

std

gcc

c++11

I'm using GCC 4.9.0 on Linux. Here's my test program:

#include <iostream>
#include <string>

using namespace std;

int main(int argc, char* argv[])
{
  size_t pos = 42;
  cout << "result: " << stoi(argv[1], &pos, atoi(argv[2])) << '\n';
  cout << "consumed: " << pos << '\n';
}

Here's an expected result:

$ ./a.out 100 2
result: 4
consumed: 3

That is, it parsed "100" in base 2 as the number 4 and consumed all 3 characters.

We can do similar up to base 36:

 $ ./a.out 100 36
result: 1296
consumed: 3

But what about larger bases?

$ ./a.out 100 37
result: 0
consumed: 18446744073707449552

What's this? The pos is supposed to be an index where it stopped parsing. Here it's close to std::string::npos but not quite (off by a few million). And if I compile without optimization then pos is 18446744073703251929 instead, so it looks like uninitialized garbage, despite that I did initialize it (to 42). And indeed, valgrind complains:

Conditional jump or move depends on uninitialised value(s)
  at 0x400F11: int __gnu_cxx::__stoa<long, int, char, int>(...) (in a.out)
  by 0x400EC7: std::stoi(std::string const&, unsigned long*, int) (in a.out)

So that's interesting. Also, the documentation of std::stoi says it throws std::invalid_argument if no conversion could be performed. Clearly in this case it didn't perform any conversion, and it returned garbage in pos, and there was no exception thrown.

Similar bad things happen if base is 1 or negative.

Is this a bug in the GCC implementation, a bug in the standard, or just something we have to learn to live with? I thought one of the goals of stoi() vs atoi() was better error detection, but it seems not to check base at all.


Edit: here's a C version of the same program which also prints errno:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[])
{
  char* pos = (char*)42;
  printf("result: %ld\n", strtol(argv[1], &pos, atoi(argv[2])));
  printf("consumed: %lu (%p)\n", pos - argv[1], pos);
  perror("errno");
  return 0;
}

When it works, it does the same thing as before. When it fails, it's a lot more clear:

$ ./a.out 100 37
result: 0
consumed: 18446603340345143502 (0x2a)
errno: Invalid argument

Now we see why pos in the C++ version was a "garbage" value: it was because strtol() left endptr unchanged, and the C++ wrapper erroneously subtracts the input string starting address from that.

In the C version we also see that errno is set to EINVAL to indicate the error. The documentation on my system says this will happen when base is invalid, but also says it's not specified by C99. If we print errno in the C++ version we can also detect this error (but it's not standard in C99 and it sure isn't specified by C++11).

like image 795
John Zwinck Avatar asked Jul 01 '14 07:07

John Zwinck


1 Answers

[C++11: 21.5/3]: Throws: invalid_argument if strtol, strtoul, strtoll, or strtoull reports that no conversion could be performed. [..]

[C99: 7.20.1.4/5]: If the subject sequence has the expected form and the value of base is zero, the sequence of characters starting with the first digit is interpreted as an integer constant according to the rules of 6.4.4.1. If the subject sequence has the expected form and the value of base is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value as given above. [..]

No semantics are specified in C99 for the case when base is not zero or between 2 and 36, so the result is undefined. This does not necessarily satisfy the excerpt from [C++11: 21.5/3].

In short, this is UB; you'd expect an exception only when the base is valid but the input value is inconvertible in that base. This is a bug in neither GCC nor the standard.

like image 133
Lightness Races in Orbit Avatar answered Oct 25 '22 13:10

Lightness Races in Orbit