I want to emulate the x86 extended precision type and perform arithmetic operations and casts to other types in Java.
I could try to implement it using BigDecimal, but covering all the special cases around NaNs, infinity, and casts would probably a tedious task. I am aware of some libraries that provide other floating types with a higher precision than double, but I want to have the same precision as the x86 80-bit float.
Is there a Java library that provides such a floating point type? If not, can you provide other hints that would allow to implement such a data type with less effort than coming up with a custom BigDecimal solution?
float Data Type It is a 32-bit, single-precision IEEE 754 (Standard for Floating-Point Arithmetic) floating-point number. It means that it gives 6-7 decimal digits precision.
Floating-point numbers are used to represent numbers that have a decimal point in them (such as 5.3 or 99.234). Whole numbers can also be represented, but as a floating point, the number 5 is actually 5.0. In Java, floating-point numbers are represented by the types float and double.
Java uses a subset of the IEEE 754 binary floating point standard to represent floating point numbers and define the results of arithmetic operations. Virtually all modern computers conform to this standard. A float is represented using 32 bits, and each possible combination of bits represents one real number.
If you know that your Java code will actually run on an x86 processor, implement the 80-bit arithmetic in assembly (or C, if the C compiler supports that) and invoke with JNI.
If you are targeting a particular non-x86 platform, look at qemu code. There should be some way to rip out just the part that does 80-bit float operations. (Edit: qemu's implementation is SoftFloat.). Call it with JNI.
If you truly want cross-platform pure-Java 80-bit arithmetic, you could probably still compare it against the C implementation in open-source CPU emulators to make sure you're addressing the right corner cases.
An 80-bit value should be best held as combination of a long
(for the mantissa) and an int
for the exponent and sign. For many operations, it will probably be most practical to place the upper and lower halves of the long into separate "long" values, so the code for addition of two numbers with matching signs and exponents would probably be something like:
long resultLo = (num1.mant & 0xFFFFFFFFL)+(num2.mant & 0xFFFFFFFFL);
long resultHi = (num1.mant >>> 32)+(num2.mant >>> 32)+(resultLo >>> 32);
result.exp = num1.exp; // Should match num2.exp
if (resultHi > 0xFFFFFFFFL) {
exponent++;
resultHi = (resultHi + ((resultHi & 2)>>>1)) >>> 1; // Round the result
}
rest.mant = (resultHi << 32) + resultLo;
A bit of a nuisance all around, but not completely unworkable. The key is to break numbers into pieces small enough that you can do all your math as type "long".
BTW, note that if one of the numbers did not originally have the same exponent, it will be necessary to keep track of whether any bits "fell off the end" when shifting it left or right to match the exponent of the first number, so as to be able to properly round the result afterward.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With