Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More Precise Floating point Data Types than double?

Tags:

c++

types

In my project I have to compute division, multiplication, subtraction, addition on a matrix of double elements. The problem is that when the size of matrix increases the accuracy of my output is drastically getting affected. Currently I am using double for each element which I believe uses 8 bytes of memory & has accuracy of 16 digits irrespective of decimal position. Even for large size of matrix the memory occupied by all the elements is in the range of few kilobytes. So I can afford to use datatypes which require more memory. So I wanted to know which data type is more precise than double. I tried searching in some books & I could find long double. But I dont know what is its precision. And what if I want more precision than that?

like image 834
Cool_Coder Avatar asked Mar 27 '13 13:03

Cool_Coder


People also ask

What has more precision than double?

In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double .

Which data type is most precise?

The double data type has more precision as compared to the three other data types. This data type has more digits towards the right of decimal points as compared to other data types. For instance, the float data type contains six digits of precision whereas double data type comprises of fourteen digits.

Which data type gives you better precision?

Double is more precise than float and can store 64 bits, double of the number of bits float can store. Double is more precise and for storing large numbers, we prefer double over float. For example, to store the annual salary of the CEO of a company, double will be a more accurate choice.

What is the precision of float vs double?

Solution. A variable of type float only has 7 digits of precision whereas a variable of type double has 15 digits of precision. If you need better accuracy, use double instead of float.


2 Answers

According to Wikipedia, 80-bit "Intel" IEEE 754 extended-precision long double, which is 80 bits padded to 16 bytes in memory, has 64 bits mantissa, with no implicit bit, which gets you 19.26 decimal digits. This has been the almost universal standard for long double for ages, but recently things have started to change.

The newer 128-bit quad-precision format has 112 mantissa bits plus an implicit bit, which gets you 34 decimal digits. GCC implements this as the __float128 type and there is (if memory serves) a compiler option to set long double to it.

like image 144
Potatoswatter Avatar answered Nov 13 '22 05:11

Potatoswatter


You might want to consider the sequence of operations, i.e. do the additions in an ordered sequence starting with the smallest values first. This will increase overall accuracy of the results using the same precision in the mantissa:

1e00 + 1e-16 + ... + 1e-16 (1e16 times) = 1e00
1e-16 + ... + 1e-16 (1e16 times) + 1e00 = 2e00

The point is that adding small numbers to a large number will make them disappear. So the latter approach reduces the numerical error

like image 44
ogni42 Avatar answered Nov 13 '22 05:11

ogni42