I'm using GCC 4.9.0 on Linux. Here's my test program:
#include <iostream>
#include <string>
using namespace std;
int main(int argc, char* argv[])
{
size_t pos = 42;
cout << "result: " << stoi(argv[1], &pos, atoi(argv[2])) << '\n';
cout << "consumed: " << pos << '\n';
}
Here's an expected result:
$ ./a.out 100 2
result: 4
consumed: 3
That is, it parsed "100" in base 2 as the number 4 and consumed all 3 characters.
We can do similar up to base 36:
$ ./a.out 100 36
result: 1296
consumed: 3
But what about larger bases?
$ ./a.out 100 37
result: 0
consumed: 18446744073707449552
What's this? The pos
is supposed to be an index where it stopped parsing. Here it's close to std::string::npos
but not quite (off by a few million). And if I compile without optimization then pos
is 18446744073703251929
instead, so it looks like uninitialized garbage, despite that I did initialize it (to 42). And indeed, valgrind complains:
Conditional jump or move depends on uninitialised value(s)
at 0x400F11: int __gnu_cxx::__stoa<long, int, char, int>(...) (in a.out)
by 0x400EC7: std::stoi(std::string const&, unsigned long*, int) (in a.out)
So that's interesting. Also, the documentation of std::stoi
says it throws std::invalid_argument if no conversion could be performed. Clearly in this case it didn't perform any conversion, and it returned garbage in pos
, and there was no exception thrown.
Similar bad things happen if base
is 1 or negative.
Is this a bug in the GCC implementation, a bug in the standard, or just something we have to learn to live with? I thought one of the goals of stoi()
vs atoi()
was better error detection, but it seems not to check base
at all.
Edit: here's a C version of the same program which also prints errno:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
char* pos = (char*)42;
printf("result: %ld\n", strtol(argv[1], &pos, atoi(argv[2])));
printf("consumed: %lu (%p)\n", pos - argv[1], pos);
perror("errno");
return 0;
}
When it works, it does the same thing as before. When it fails, it's a lot more clear:
$ ./a.out 100 37
result: 0
consumed: 18446603340345143502 (0x2a)
errno: Invalid argument
Now we see why pos
in the C++ version was a "garbage" value: it was because strtol()
left endptr
unchanged, and the C++ wrapper erroneously subtracts the input string starting address from that.
In the C version we also see that errno
is set to EINVAL
to indicate the error. The documentation on my system says this will happen when base
is invalid, but also says it's not specified by C99. If we print errno
in the C++ version we can also detect this error (but it's not standard in C99 and it sure isn't specified by C++11).
[C++11: 21.5/3]:
Throws: invalid_argument ifstrtol
,strtoul
,strtoll
, orstrtoull
reports that no conversion could be performed. [..]
[C99: 7.20.1.4/5]:
If the subject sequence has the expected form and the value ofbase
is zero, the sequence of characters starting with the first digit is interpreted as an integer constant according to the rules of 6.4.4.1. If the subject sequence has the expected form and the value ofbase
is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value as given above. [..]
No semantics are specified in C99 for the case when base
is not zero or between 2 and 36, so the result is undefined. This does not necessarily satisfy the excerpt from [C++11: 21.5/3]
.
In short, this is UB; you'd expect an exception only when the base is valid but the input value is inconvertible in that base. This is a bug in neither GCC nor the standard.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With