Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Robust parsing of integers in C++ [closed]

Tags:

c++

parsing

I'm trying to write a helper function that can be used for parsing integers from config files and from a text-based protocol (written by machine, not by a human). I've read How to parse a string to an int in C++? but the solutions there don't address all the issues. I would like something that will (from most to least important):

  1. Reject out-of-range values. strtoul and strtoull don't quite achieve this: given a leading minus sign, the value is negated "in the return type". So "-5" is happily parsed and returns 4294967291 or 18446744073709551611 instead of signalling an error.
  2. Be in the C locale, regardless of the global locale setting (or even better, give me a choice). Unless there is a way to set the global locale on a per-thread basis, that rules out strtoul, stoul and boost::lexical_cast, and leaves only istringstream (where one can imbue a locale).
  3. Be reasonably strict. It definitely must not accept trailing garbage, and ideally I'd like to ban white space as well. That immediately makes strtol and anything based on it a little problematic. It seems that istringstream can work here using noskipws and checking for EOF, although that might just be a GCC bug.
  4. Ideally give some control whether the base should be assumed to be 10 or should be inferred from a 0 or 0x prefix.

Any ideas on a solution? Is there an easy way to wrap the existing parsing machinery to meet these requirements, or is it going to end up being less work to write the parser myself?

like image 771
Bruce Merry Avatar asked Oct 28 '13 18:10

Bruce Merry


2 Answers

There are some quick hacks, parse as normal (non robust) and do some small checks in the input (for example if parsing an non-negative number check that it doesn't have '-' character).

The ultimate test of robustness is to convert the integer back to text, and check that the input text and the output text is the same. When working in the text version, then you can relax things, like accepting leading 0's or spaces.

like image 124
alfC Avatar answered Oct 13 '22 01:10

alfC


You basically want the num_get<char> facet of the C locale. It's somewhat complicated, so see this example. Basically, you have to call use_facet<num_get<char,string::iterator> > (locale::classic).get(begin, end, ... , outputValue).

like image 27
MSalters Avatar answered Oct 13 '22 00:10

MSalters