For some universitary work i have to approximate some numbers - like the Euler one with series. Therefore i have to add very small numbers, but i have problems with the precision. If the number ist very small it doesn't influence the result.
real s; //sum of all previous terms
ulong k; //factorial
s += 1.0/ k;
after each step, k gets even smaller, but after the 10th round the result isn't changeing anymore and stuck at 2.71828
Fixed-precision floating point types, the ones natively supported by your CPU's floating point unit (float
, double
, real
) are not optimal for any calculation that needs many digits of precision, such as the example you've given.
The problem is that these floating-point types have a finite number of digits of precision (binary digits, actually) that limits the length of number that can be represented by such a data type. The float
type has a limit of approximately 7 decimal digits (e.g. 3.141593); the double
type is limited to 14 (e.g. 3.1415926535898); and the real
type has a similar limit (slightly more than that of double
).
Adding exceedingly small numbers to a floating-point value will therefore result in those digits being lost. Watch what happens when we add the following two float values together:
float a = 1.234567f, b = 0.0000000001234567
float c = a + b;
writefln("a = %f b = %f c = %f", a, b, c);
Both a
and b
are valid float values and retain approximately 7 digits of precision apiece in isolation. But when added, only the frontmost 7 digits are preserved because it's getting shoved back into a float:
1.2345670001234567 => 1.234567|0001234567 => 1.234567
^^^^^^^^^^^
sent to the bit bucket
So c
ends up equal to a
because the finer digits of precision from the addition of a
and b
get whacked off.
Here's another explanation of the concept, probably much better than mine.
The answer to this problem is arbitrary-precision arithmetic. Unfortunately, support for arbitrary-precision arithmetic is not in CPU hardware; therefore, it's not (typically) in your programming language. However, there are many libraries that support arbitrary-precision floating-point types and the math you want to perform on them. See this question for some suggestions. You probably won't find any D-specific libraries for this purpose today, but there are plenty of C libraries (GMP, MPFR, and so on) that should be easy enough to use in isolation, and even more so if you can find D bindings for one of them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With