I need to make use of numbers coming from another system that are 128-bit (quadruple-precision) floating point numbers in java.
Considering that there is no equivalent type in java, I would like to reduce the precision of the numbers using java code so they can be stored in a java double. This can be done fairly easily in c or using assembly but I would like to do it purely in java.
It is fair to assume that the quadruple-precision number is stored in a 128-bit byte array in java.
Is there a good solution, using only java? Thanks.
In computing, quadruple precision (or quad precision) is a binary floating point–based computer number format that occupies 16 bytes (128 bits) with precision at least twice the 53-bit double precision.
The float and double primitive types in Java are floating point numbers, where the number is stored as a binary representation of a fraction and a exponent. More specifically, a double-precision floating point value such as the double type is a 64-bit value, where: 1 bit denotes the sign (positive or negative).
The XDR standard defines the encoding for the double-precision floating-point data type as a double. The length of a double is 64 bits or 8 bytes. Doubles are encoded using the IEEE standard for normalized double-precision floating-point numbers.
Double-precision floating-point format (sometimes called FP64 or float64) is a computer number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
I was so intrigued by this question that I was compelled to write a library to handle IEEE-754 floating point numbers. With the library, you can use the following:
byte[] quadBytes; // your quad-floating point number in 16 bytes
IEEE754 quad = IEEE754.decode(IEEE754Format.QUADRUPLE,
BitUtils.wrapSource(quadBytes));
// IEEE754 holds the number in a 'lossless' format
From there, you can:
ByteBuffer doubleBuffer = ByteBuffer.allocateDirect(8);
quad.toBits(IEEE754Format.DOUBLE, BitUtils.wrapSink(doubleBuffer));
doubleBuffer.rewind();
double converted = doubleBuffer.asDoubleBuffer().get();
But the above snippet is just to illustrate general usage... a shorthand is provided for double:
double converted = quad.doubleValue();
The code is available at kerbaya.com/ieee754lib.
Depending on the size of the data set BigDecimal
instantiated from an imported String
representation might be an easy and accurate option. I assume one can export string representations of those numbers from any programming language.
Although the question was asked rather long ago, perhaps it may still be of interest for someone. There is a Java class for 128-bit floating point arithmetic, that has methods for converting 128-bit IEEE-754 floating-point values into its own internal representation without any loss of precision. It can perform arithmetic operations on such values, and convert them back to IEEE-754
binary128
, as well as to other common numeric types like BidDecimal
, double
and long
. It can also parse strings containing decimal representations of such values and convert them back to strings. Internally, it stores 128 bits of the mantissa, so that the relative error of the calculations does not exceed 1.47e-39.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With