Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling a quadruple precision floating point (128-bit) number in java

I need to make use of numbers coming from another system that are 128-bit (quadruple-precision) floating point numbers in java.

Considering that there is no equivalent type in java, I would like to reduce the precision of the numbers using java code so they can be stored in a java double. This can be done fairly easily in c or using assembly but I would like to do it purely in java.

It is fair to assume that the quadruple-precision number is stored in a 128-bit byte array in java.

Is there a good solution, using only java? Thanks.

like image 713
user474762 Avatar asked Jan 10 '14 18:01

user474762


People also ask

How many digits is quadruple precision?

In computing, quadruple precision (or quad precision) is a binary floating point–based computer number format that occupies 16 bytes (128 bits) with precision at least twice the 53-bit double precision.

What is precision floating point in Java?

The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent. More specifically, a double-precision floating point value such as the double type is a 64-bit value, where: 1 bit denotes the sign (positive or negative).

Is 64 bit double precision?

The XDR standard defines the encoding for the double-precision floating-point data type as a double. The length of a double is 64 bits or 8 bytes. Doubles are encoded using the IEEE standard for normalized double-precision floating-point numbers.

How many bits are used for a double precision floating point number?

Double-precision floating-point format (sometimes called FP64 or float64) is a computer number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.


3 Answers

I was so intrigued by this question that I was compelled to write a library to handle IEEE-754 floating point numbers. With the library, you can use the following:

byte[] quadBytes; // your quad-floating point number in 16 bytes
IEEE754 quad = IEEE754.decode(IEEE754Format.QUADRUPLE, 
        BitUtils.wrapSource(quadBytes));
// IEEE754 holds the number in a 'lossless' format

From there, you can:

ByteBuffer doubleBuffer = ByteBuffer.allocateDirect(8);
quad.toBits(IEEE754Format.DOUBLE, BitUtils.wrapSink(doubleBuffer));
doubleBuffer.rewind();
double converted = doubleBuffer.asDoubleBuffer().get();

But the above snippet is just to illustrate general usage... a shorthand is provided for double:

double converted = quad.doubleValue();

The code is available at kerbaya.com/ieee754lib.

like image 135
Glenn Lane Avatar answered Oct 05 '22 03:10

Glenn Lane


Depending on the size of the data set BigDecimal instantiated from an imported String representation might be an easy and accurate option. I assume one can export string representations of those numbers from any programming language.

like image 32
Oleg Sklyar Avatar answered Oct 05 '22 05:10

Oleg Sklyar


Although the question was asked rather long ago, perhaps it may still be of interest for someone. There is a Java class for 128-bit floating point arithmetic, that has methods for converting 128-bit IEEE-754 floating-point values into its own internal representation without any loss of precision. It can perform arithmetic operations on such values, and convert them back to IEEE-754 binary128, as well as to other common numeric types like BidDecimal, double and long. It can also parse strings containing decimal representations of such values and convert them back to strings. Internally, it stores 128 bits of the mantissa, so that the relative error of the calculations does not exceed 1.47e-39.

like image 34
m. vokhm Avatar answered Oct 05 '22 04:10

m. vokhm