There seem to be two definitions for the Machine-Epsilon:
First of all, i fail to see how these two correlate. Second DBL_EPSILON does not conform to Definition 2 in my understanding:
The following Program prints:
DBL_EPSILON: 2.220446049250313080847e-16
DBL_EPSILON / 2: 1.110223024625156540424e-16
1.0 + DBL_EPSILON: 1.000000000000000222045e+00
1.0 + DBL_EPSILON / 2: 1.000000000000000000000e+00
m_eps 2.220446049250313080847e-16
m_eps -1u 2.220446049250312834328e-16
1.0 + m_eps -1u 1.000000000000000222045e+00
(m_eps -1u < DBL_EPSILON): True
(m_eps -1u == DBL_EPSILON/2): False
m_eps -1u
should be a number smaller but really close to DBL_EPSILON
. With
Definiton 2) should 1.0 + m_eps -1u
not evaluate to 1.0
? Why is it necessary
to devide DBL_EPSILON
by 2 for this?
#include <stdout.h>
#include <stdint.h>
#inlcude <float.h>
union Double_t {
double f;
int64_t i;
};
int main(int argc, char *argv[])
{
union Double_t m_eps;
printf("DBL_EPSILON: \t\t%.*e\n", DECIMAL_DIG, DBL_EPSILON);
printf("DBL_EPSILON / 2: \t%.*e\n", DECIMAL_DIG, DBL_EPSILON / 2);
printf("1.0 + DBL_EPSILON: \t%.*e\n", DECIMAL_DIG, 1.0 + DBL_EPSILON);
printf("1.0 + DBL_EPSILON / 2: \t%.*e\n", DECIMAL_DIG, 1.0 + DBL_EPSILON / 2);
m_eps.f = DBL_EPSILON;
printf("\nm_eps \t\t\t%.*e\n", DECIMAL_DIG, m_eps.f);
m_eps.i -= 1;
printf("m_eps -1u\t\t%.*e\n", DECIMAL_DIG, m_eps.f);
printf("\n1.0 + (m_eps -1u)\t\t%.*e\n", DECIMAL_DIG, 1.0 + m_eps.f);
printf("\n(m_eps -1u < DBL_EPSILON): %s\n",
(m_eps.f < DBL_EPSILON) ? "True": "False"
);
printf("(m_eps -1u == DBL_EPSILON/2): %s\n",
(DBL_EPSILON/2 == m_eps.f) ? "True": "False"
);
return 0;
}
A wrong definition of DBL_EPSILON
, the one you quote as “The minimum positive number such that 1.0 + machine_eps != 1”, is floating around. You can even find it in standard libraries and in otherwise fine answers on StackOverflow. When found in standard libraries, it is in a comment near a value that obviously does not correspond to the comment, but corresponds to the correct definition:
DBL_EPSILON: This is the difference between 1 and the smallest floating point number of type
double
that is greater than 1. (correct definition taken from the GNU C library)
The C99 standard phrases it this way:
the difference between 1 and the least value greater than 1 that is representable in the given floating point type, b^(1−p)
This is probably the cause of your confusion. Forget about the wrong definition. I wrote a rant about this here (which is very much like your question).
The other definition in your question, “the maximum relative Error when rounding a real number to the next floating-point number”, is correct-ish when the result of the rounding is a normal floating-point number. Rounding a real to finite floating-point number produces a floating-point number within 1/2 ULP of the real value. For a normal floating-point number, this 1/2 ULP absolute error translates to a relative error that can be between DBL_EPSILON/2 and DBL_EPSILON/4 depending where the floating-point number is located in its binade.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With