How can I compute `exp(x)/2` when `x` is large?

Question

In trying to write my sinh(x): double my_sinh(double x), I came across the corner case.

When x is in the 709.782... to 710.475... range, how to calculate exp(x)/2?

Hyperbolic sine sinh(x) and cosine cosh(x) are both well approximated by exp(x)/2 when x >= 20 (or so, depending on the type details of x).

With common double, the largest finite value DBL_MAX is about 1.798...x10³⁰⁸.
The most positive x with a finite sinh(x) and cosh(x) is about 710.475....
The most positive x with a finite exp(x) is about 709.782....

Since exp(710.0) overflows, the following /2 can be too late to achieve a finite result.

The code could use a wider precision/range, like long double, yet long double is not certainly any better than double, and I am looking for a double-only solution.

#include <assert.h>
#include <math.h>
#include <stdio.h>

double my_sinh_f1(double x) {
  assert(x >= 709);  // Only concerned about large values for now.
  return (double) (expl(x)/2);  // For now, call a long double function.
}

/*
 * Over a range of values:
 * Compute the difference between f(x) and higher precision sinhl.
 * Scale difference to the unit-in-the-last-place (ULP) of the double result.
 * Return the worst case ULP.
 */
double test_sinh(double (f)(double), double x0, double x1, unsigned n) {
  double error_max = 0.0;
  double dx = (x1 - x0)/n;
  for (double x = x0; x <= x1; x += dx) {
    long double y0_as_ld = sinhl(x);
    double y0 = (double) y0_as_ld;
    double ulp = y0 > y0_as_ld ? y0 - nextafter(y0,0) : nextafter(y0,INFINITY) - y0;
    double y1 = f(x);
    double error = (double) ((y1 - y0_as_ld)/ulp);
    error = fabs(error);
    if (error > error_max) {
      error_max = error;
    }
  }
  return error_max;
}

int main(void) {
  double (*f)(double) = my_sinh_f1;
  double x0= log(DBL_MAX);
  double x1 = asinh(DBL_MAX);
  unsigned n = 1000000007; //1000003;  // Some prime
  printf("x0:%.3f, x1:%.3f n:%10u error:%g(ulp)
", x0, x1, n, test_sinh(f, x0, x1, n));
}

The above returns 0.5 for the ULP error, which is not surprising, given the extra wide sinhl().

chux - Reinstate Monica · Accepted Answer

How to calculate exp(x)/2?

To avoid overflow of exp(x) yet e^x/2 is in double range, use exp(x - v)*exp(v)/2 with a carefully chosen v (example: 0x1.62e4307c58800p-1) to minimize error generation.

v has two special properties:

Subtraction of x in this range by v is exact.
Mathematical e^v and as a double are close (within 0.001 ULP or better).

Using the math identity:

e^x/2 = e^{(x - v)}*e^v/2

We could select constant v = log_e(2), (or v == 0.693...) then

e^x/2 = e^{(x - log(2))}*2/2

double my_sinh_f2(double x) {
  assert(x >= 709);
  static const double v = 0.69314718055994530941; // log(2)
  static const double scale = 2.0/2;
  return exp(x - v)*scale;
}

x0:709.783, x1:710.476 n:1000000007 error:495.895(ulp)

Then we will have succeeded is eliminating long double, yet the ULP error jumps to 495.

This is because the error in the subtraction is multiplied by the magnitude of x for the exp(x) function.

Instead, let us select a v that is at least log_e(2), yet a multiple of the ULP of x or next_power_of_2(709)/pow(2,53) or 1024/9,007,199,254,740,992.

ulp = 1024.0/pow(2,53);
v = ceil(log(2)/ulp)*ulp; // 0.69314718056000401...

double my_sinh_f3(double x) {
  assert(x >= 709);
  // Note 10 LSbits are zero -----------vvv
  static const double v = 0x1.62e42fefa3c00p-1; // 0.69314718056000401...; // ceil(log(2)*pow(2,43))/pow(2,43)
  static const double scale = 2.000000000000117415215/2; // exp(v)/2
  return exp(x - v)*scale;
}

Now the ULP error has reduced to 1.792 as the subtraction is exact. The final error is exp() (hopefully quite small like < 1.0) plus the conversion of e^v to a double (about 0.5) and the multiplication error (another 0.5).

x0:709.783, x1:710.476 n:1000000007 error:1.79199(ulp)

Yet can we do better?

What if the extended precision value of `exp(v2)` was of the form `0x1.000000xxxxxxx000...p+1`?

With v1 = 0.69314718056000401..., e^v1 is 2.0000000000001174... or written in hexadecimal with extended precision as 0x1.0000000000108654...p+1, yet it is only the significant (53 binary digits) or 0x1.0000000000109p+1 that is used as a double constant and that remaining -0x0.0000000000000346...p+1 is the error in forming the scale constant.

We could choose an offset v such that its e^v has much less error in its double representation. Then the value used as a double would be 0x1.000000xxxxxxxp+1, which is more than a 1000 times closer to exp(v2)/2 than 0x1.0000000000109p+1 is to exp(v1)/2.

After testing for values above log(2) and are a multiple of the ULP of 709 (so the subtraction is exact), a value of v == 0.69314719694034466 leads to: scale == 0x1.000000465a709000p+1/2 which has 11 mores bits as zero as compared to the double precision: scale == 0x1.000000465a709p+1/2.

double my_sinh_f4(double x) {
  assert(x >= 709);
  // Note 10 LSbits are zero     -----------vvv
  static const double v     = 0x1.62e4307c58800p-1; // 0.69314719694034466;
  static const double scale = 0x1.000000465a709000p+1/2; // exp(v)/2
  //                                           ^^^
  // Note next 11 bits of e**v, found using long double, are zero.
  return exp(x - v)*scale;
}

So with this v offset, no subtraction error occurs and the scale used is 1000 times closer to e^v than our prior attempt.

Boo-yah!

x0:709.783, x1:710.476 n:1000000007 error:1.00781(ulp).

How does our 1.00781(ulp) compare to the library function sinh()? Below reports we are nearly a whole 1.0 ULP better.

// sinh
x0:709.783, x1:710.476 n:1000000007 error:1.91992(ulp)

Further testing of library sinh() indicates for large x up to 709.783 or so, its ULP is quite good: 0.6 and then has a ULP jump to 1.920 in the range of this question. I suspect my gcc library sinh() would benefit with the similar constant change.

Notes:

float: A similar investigation for float resulted in: v = 0x1.639900p-1f; scale = 0x1.005a780p+0f; and a ULP better than my gcc library sinhf() in the end-range.

sinhl() is used as the reference function for a 64-bit extended precision correct answer. A deeper analysis would use a proven reference function.

Compiler notes:

Building file: ../sinh.c
Invoking: Cygwin C Compiler
gcc -std=c11 -O0 -g3 -pedantic -Wall -Wextra -Wconversion -Wlogical-op -Wsign-conversion -Wwrite-strings -c -fmessage-length=0 -v -MMD -MP -MF"sinh.d" -MT"sinh.o" -o "sinh.o" "../sinh.c"
Using built-in specs.
COLLECT_GCC=gcc
Target: x86_64-pc-cygwin
Configured with: /mnt/share/cygpkgs/gcc/gcc.x86_64/src/gcc-12.4.0/configure --srcdir=/mnt/share/cygpkgs/gcc/gcc.x86_64/src/gcc-12.4.0 --prefix=/usr --exec-prefix=/usr --localstatedir=/var --sysconfdir=/etc --docdir=/usr/share/doc/gcc --htmldir=/usr/share/doc/gcc/html -C --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin --target=x86_64-pc-cygwin --without-libiconv-prefix --without-libintl-prefix --libexecdir=/usr/lib --with-gcc-major-version-only --enable-shared --enable-shared-libgcc --enable-static --enable-version-specific-runtime-libs --enable-bootstrap --enable-__cxa_atexit --enable-clocale=newlib --with-dwarf2 --with-tune=generic --enable-languages=ada,c,c++,fortran,lto,objc,obj-c++,jit --enable-graphite --enable-threads=posix --enable-libatomic --enable-libgomp --enable-libquadmath --enable-libquadmath-support --disable-libssp --enable-libada --disable-symvers --disable-multilib --with-gnu-ld --with-gnu-as --with-cloog-include=/usr/include/cloog-isl --without-libiconv-prefix --without-libintl-prefix --with-system-zlib --enable-linker-build-id --with-default-libstdcxx-abi=gcc4-compatible --enable-libstdcxx-filesystem-ts
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.4.0 (GCC)
COLLECT_GCC_OPTIONS='-std=c11' '-O0' '-g3' '-Wpedantic' '-Wall' '-Wextra' '-Wconversion' '-Wlogical-op' '-Wsign-conversion' '-Wwrite-strings' '-c' '-fmessage-length=0' '-v' '-MMD' '-MP' '-MF' 'sinh.d' '-MT' 'sinh.o' '-o' 'sinh.o' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-pc-cygwin/12/cc1.exe -quiet -v -MMD sinh.d -MF sinh.d -MP -MT sinh.o -dD -idirafter /usr/lib/gcc/x86_64-pc-cygwin/12/../../../../lib/../include/w32api -idirafter /usr/lib/gcc/x86_64-pc-cygwin/12/../../../../x86_64-pc-cygwin/lib/../lib/../../include/w32api ../sinh.c -quiet -dumpbase sinh.c -dumpbase-ext .c -mtune=generic -march=x86-64 -g3 -O0 -Wpedantic -Wall -Wextra -Wconversion -Wlogical-op -Wsign-conversion -Wwrite-strings -std=c11 -version -fmessage-length=0 -o /cygdrive/c/Users/TPC/AppData/Local/Temp/ccNNmUyM.s
GNU C11 (GCC) version 12.4.0 (x86_64-pc-cygwin)
    compiled by GNU C version 12.4.0, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.1, isl version isl-0.27-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-pc-cygwin/12/include-fixed"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-pc-cygwin/12/../../../../x86_64-pc-cygwin/include"
ignoring duplicate directory "/usr/lib/gcc/x86_64-pc-cygwin/12/../../../../x86_64-pc-cygwin/lib/../lib/../../include/w32api"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-pc-cygwin/12/include
 /usr/include
 /usr/lib/gcc/x86_64-pc-cygwin/12/../../../../lib/../include/w32api
End of search list.
GNU C11 (GCC) version 12.4.0 (x86_64-pc-cygwin)
    compiled by GNU C version 12.4.0, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.1, isl version isl-0.27-GMP

How can I compute `exp(x)/2` when `x` is large?

Tags:

c

floating-point

exp

hyperbolic-function

chux - Reinstate Monica

1 Answers

What if the extended precision value of `exp(v2)` was of the form `0x1.000000xxxxxxx000...p+1`?

Boo-yah!

chux - Reinstate Monica

Recent Activity

Donate For Us

How can I compute `exp(x)/2` when `x` is large?

Tags:

c

floating-point

exp

hyperbolic-function

chux - Reinstate Monica

1 Answers

What if the extended precision value of exp(v2) was of the form 0x1.000000xxxxxxx000...p+1?

Boo-yah!

chux - Reinstate Monica

Related questions

Recent Activity

Donate For Us

What if the extended precision value of `exp(v2)` was of the form `0x1.000000xxxxxxx000...p+1`?