Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to perform high precision calculations in D?

For some universitary work i have to approximate some numbers - like the Euler one with series. Therefore i have to add very small numbers, but i have problems with the precision. If the number ist very small it doesn't influence the result.

real s;  //sum of all previous terms
ulong k; //factorial

s += 1.0/ k;

after each step, k gets even smaller, but after the 10th round the result isn't changeing anymore and stuck at 2.71828

like image 617
NaN Avatar asked Dec 10 '22 11:12

NaN


1 Answers

Fixed-precision floating point types, the ones natively supported by your CPU's floating point unit (float, double, real) are not optimal for any calculation that needs many digits of precision, such as the example you've given.

The problem is that these floating-point types have a finite number of digits of precision (binary digits, actually) that limits the length of number that can be represented by such a data type. The float type has a limit of approximately 7 decimal digits (e.g. 3.141593); the double type is limited to 14 (e.g. 3.1415926535898); and the real type has a similar limit (slightly more than that of double).

Adding exceedingly small numbers to a floating-point value will therefore result in those digits being lost. Watch what happens when we add the following two float values together:

float a = 1.234567f, b = 0.0000000001234567
float c = a + b;

writefln("a = %f b = %f c = %f", a, b, c);

Both a and b are valid float values and retain approximately 7 digits of precision apiece in isolation. But when added, only the frontmost 7 digits are preserved because it's getting shoved back into a float:

1.2345670001234567 => 1.234567|0001234567 => 1.234567
                              ^^^^^^^^^^^
                         sent to the bit bucket

So c ends up equal to a because the finer digits of precision from the addition of a and b get whacked off.

Here's another explanation of the concept, probably much better than mine.


The answer to this problem is arbitrary-precision arithmetic. Unfortunately, support for arbitrary-precision arithmetic is not in CPU hardware; therefore, it's not (typically) in your programming language. However, there are many libraries that support arbitrary-precision floating-point types and the math you want to perform on them. See this question for some suggestions. You probably won't find any D-specific libraries for this purpose today, but there are plenty of C libraries (GMP, MPFR, and so on) that should be easy enough to use in isolation, and even more so if you can find D bindings for one of them.

like image 152
jgottula Avatar answered Feb 01 '23 02:02

jgottula