Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IEEE-754 floating-point precision: How much error is allowed?

I'm working on porting the sqrt function (for 64-bit doubles) from fdlibm to a model-checker tool I'm using at the moment (cbmc).
As part of my doings, I read a lot about the ieee-754 standard, but I think I didn't understand the guarantees of precision for the basic operations (incl. sqrt).

Testing my port of fdlibm's sqrt, I got the following calculation with sqrt on a 64-bit double:

sqrt(1977061516825203605555216616167125005658976571589721139027150498657494589171970335387417823661417383745964289845929120708819092392090053015474001800648403714048.0) = 44464159913633855548904943164666890000299422761159637702558734139742800916250624.0

(this case broke a simple post-condition in my test regarding precision; I'm not sure anymore if this post-condition is possible with IEEE-754)

For a comparison, several multi-precision tools calculated something like:

sqrt(1977061516825203605555216616167125005658976571589721139027150498657494589171970335387417823661417383745964289845929120708819092392090053015474001800648403714048.0) =44464159913633852501611468455197640079591886932526256694498106717014555047373210.truncated

One can see, that the 17-th number from the left is different, meaning an error like:

3047293474709469249920707535828633381008060627422728245868877413.0

Question 1: Is this huge amount of error allowed?

The standard is saying that every basic operation (+,-,*,/,sqrt) should be within 0.5 ulps, meaning that it should be equal to a mathematically exact result rounded to the nearest fp-representation (wiki is saying that some libraries only guarantees 1 ulp, but that isn't that important at the moment).

Question 2: Does that mean, that every basic operation should have an error < 2.220446e-16 with 64-bit doubles (machine-epsilon)?

I did calculate the same with a x86-32 linux system (glibc / eglibc) and got the same result like that obtained with fdlibm, which let me think that:

  • a: I did something wrong (but how: printf would be a candidate, but I don't know if that could be the reason)
  • b: the error/precision is common in these libraries
like image 276
sascha Avatar asked Nov 30 '10 20:11

sascha


People also ask

How accurate is IEEE 754?

IEEE 754 specifies: Two basic floating-point formats: single and double. The IEEE single format has a significand precision of 24 bits and occupies 32 bits overall. The IEEE double format has a significand precision of 53 bits and occupies 64 bits overall.

What can be the maximum value which can be represented in IEEE 754 single precision floating point representation?

A signed 32-bit integer variable has a maximum value of 231 − 1 = 2,147,483,647, whereas an IEEE 754 32-bit base-2 floating-point variable has a maximum value of (2 − 2−23) × 2127 ≈ 3.4028235 × 1038.

What is maximum precision value of floating value?

Decimal floating-point (DECFLOAT) The position of the decimal point is stored in each decimal floating-point value. The maximum precision is 34 digits. The range of a decimal floating-point number is either 16 or 34 digits of precision, and an exponent range of 10-383 to 10+384 or 10-6143 to 10+6144, respectively.


1 Answers

The IEEE-754 standard requires that so called "basic operations" (which include addition, multiplication, division and square root) are correctly rounded. This means that there is a unique allowed answer, and it is the closest representable floating-point number to the so-called "infinitely precise" result of the operation.

In double-precision, numbers have 53 binary digits of precision, so the correct answer is the exact answer rounded to 53 significant digits. As Rick Regan showed in his answer, this is exactly the result that you got.

The answers to your questions are:

Question 1: Is this huge amount of error allowed?

Yes, but it is quite misleading to call this error "huge". The fact is that there is no double-precision value that could be returned that would have a smaller error.

Question 2: Does that mean, that every basic operation should have an error < 2.220446e-16 with 64-bit doubles (machine-epsilon)?

No. It means that every basic operation should be rounded to the (unique) closest representable floating-point number according to the current rounding mode. This is not quite the same as saying that the relative error is bounded by machine epsilon.

Question 3: Which result do you obtain with your x86 hardware and gcc + libc?

The same answer you did, because sqrt is correctly rounded on any reasonable platform.

like image 67
Stephen Canon Avatar answered Sep 23 '22 01:09

Stephen Canon