Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Precise floating-point<->string conversion

Tags:

c++

I am looking for a library function to convert floating point numbers to strings, and back again, in C++. The properties I want are that str2num(num2str(x)) == x and that num2str(str2num(x)) == x (as far as possible). The general property is that num2str should represent the simplest rational number that when rounded to the nearest representable floating pointer number gives you back the original number.

So far I've tried boost::lexical_cast:

double d = 1.34;
string_t s = boost::lexical_cast<string_t>(d);
printf("%s\n", s.c_str());
// outputs 1.3400000000000001

And I've tried std::ostringstream, which seems to work for most values if I do stream.precision(16). However, at precision 15 or 17 it either truncates or gives ugly output for things like 1.34. I don't think that precision 16 is guaranteed to have any particular properties I require, and suspect it breaks down for many numbers.

Is there a C++ library that has such a conversion? Or is such a conversion function already buried somewhere in the standard libraries/boost.

The reason for wanting these functions is to save floating point values to CSV files, and then read them correctly. In addition, I'd like the CSV files to contain simple numbers as far as possible so they can be consumed by humans.

I know that the Haskell read/show functions already have the properties I am after, as do the BSD C libraries. The standard references for string<->double conversions is a pair of papers from PLDI 1990:

  • How to read floating point numbers accurately, Will Klinger
  • How to print floating point numbers accurately, Guy Steele et al

Any C++ library/function based on these would be suitable.

EDIT: I am fully aware that floating point numbers are inexact representations of decimal numbers, and that 1.34==1.3400000000000001. However, as the papers referenced above point out, that's no excuse for choosing to display as "1.3400000000000001"

EDIT2: This paper explains exactly what I'm looking for: http://drj11.wordpress.com/2007/07/03/python-poor-printing-of-floating-point/

like image 352
Neil Mitchell Avatar asked Aug 21 '09 10:08

Neil Mitchell


2 Answers

I am still unable to find a library that supplies the necessary code, but I did find some code that does work:

http://svn.python.org/view/python/branches/py3k/Python/dtoa.c?view=markup

By supplying a fairly small number of defines it's easy to abstract away the Python integration. This code does indeed meet all the properties I outline.

like image 105
Neil Mitchell Avatar answered Oct 20 '22 00:10

Neil Mitchell


I think this does what you want, in combination with the standard library's strtod():

#include <stdio.h>
#include <stdlib.h>

int dtostr(char* buf, size_t size, double n)
{
  int prec = 15;
  while(1)
  {
    int ret = snprintf(buf, size, "%.*g", prec, n);
    if(prec++ == 18 || n == strtod(buf, 0)) return ret;
  }
}

A simple demo, which doesn't bother to check input words for trailing garbage:

int main(int argc, char** argv)
{
  int i;
  for(i = 1; i < argc; i++)
  {
    char buf[32];
    dtostr(buf, sizeof(buf), strtod(argv[i], 0));
    printf("%s\n", buf);
  }
  return 0;
}

Some example inputs:

% ./a.out 0.1 1234567890.1234567890 17 1e99 1.34 0.000001 0 -0 +INF NaN
0.1
1234567890.1234567
17
1e+99
1.34
1e-06
0
-0
inf
nan

I imagine your C library needs to conform to some sufficiently recent version of the standard in order to guarantee correct rounding.

I'm not sure I chose the ideal bounds on prec, but I imagine they must be close. Maybe they could be tighter? Similarly I think 32 characters for buf are always sufficient but never necessary. Obviously this all assumes 64-bit IEEE doubles. Might be worth checking that assumption with some kind of clever preprocessor directive -- sizeof(double) == 8 would be a good start.

The exponent is a bit messy, but it wouldn't be difficult to fix after breaking out of the loop but before returning, perhaps using memmove() or suchlike to shift things leftwards. I'm pretty sure there's guaranteed to be at most one + and at most one leading 0, and I don't think they can even both occur at the same time for prec >= 10 or so.

Likewise if you'd rather ignore signed zero, as Javascript does, you can easily handle it up front, e.g.:

if(n == 0) return snprintf(buf, size, "0");

I'd be curious to see a detailed comparison with that 3000-line monstrosity you dug up in the Python codebase. Presumably the short version is slower, or less correct, or something? It would be disappointing if it were neither....

like image 31
zaphod Avatar answered Oct 19 '22 23:10

zaphod