Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which exponentiation algorithms do CPU/programming languages use? [closed]

I've been learning about faster exponentiation algorithms (k-ary, sliding door etc.), and was wondering which ones are used in CPUs/programming languages? (I'm fuzzy on whether or not this happens in the CPU or through the compiler)

And just for kicks, which is the fastest?

Edit regarding the broadness: It's intentionally broad because I know there are a bunch of different techniques to do this. The checked answer had what I was looking for.

like image 585
Monti Avatar asked Apr 01 '15 00:04

Monti


1 Answers

I assume your interest is in implementation of the exponentiation functions that can be found in standard math libraries for HLLs, in particular C/C++. These include the functions exp(), exp2(), exp10(), and pow(), as well as single-precision counterparts expf(), exp2f(), exp10f(), and powf().

The exponentiation methods you mention (such as k-ary, sliding window) are typically employed in cryptographic algorithms, such as RSA, which is exponentiation based. They are not typically used for the exponentiation functions provided via math.h or cmath. The implementation details for standard math functions like exp() differ, but a common scheme follows a three-step process:

  1. reduction of the function argument to a primary approximation interval
  2. approximation of a suitable base function on the primary approximation interval
  3. mapping back the result for the primary interval to the entire range of the function

An auxiliary step is often the handling of special cases. These can pertain to special mathematical situations such as log(0.0), or special floating-point operands such as NaN (Not a Number).

The C99 code for expf(float) below shows in exemplary fashion what those steps look like for a concrete example. The argument a is first split such that exp(a) = er * 2i, where i is an integer and r is in [log(sqrt(0.5), log(sqrt(2.0)], the primary approximation interval. In the second step, we now approximate er with a polynomial. Such approximations can be designed according to various design criteria such as minimizing absolute or relative error. The polynomial can be evaluated in various ways including Horner's scheme and Estrin's scheme.

The code below uses a very common approach by employing a minimax approximation, which minimizes the maximum error over the entire approximation interval. A standard algorithm for computing such approximations is the Remez algorithm. Evaluation is via Horner's scheme; the numerical accuracy of this evaluation is enhanced by the use of fmaf().

This standard math function implements what is known as a fused multiply-add or FMA. This computes a*b+c using the full product a*b during addition and applying a single rounding at the end. On most modern hardware, such as GPUs, IBM Power CPUs, recent x86 processors (e.g. Haswell), recent ARM processors (as an optional extension), this maps straight to a hardware instruction. On platforms that lack such an instruction, fmaf() will map to fairly slow emulation code, in which case we would not want to use it if we are interested in performance.

The final computation is the multiplication by 2i, for which C and C++ provide the function ldexp(). In "industrial strength" library code one typically uses a machine-specific idiom here that takes advantage of the use of IEEE-754 binary arithmetic for float. Lastly, the code cleans up cases of overflow and underflow.

The x87 FPU inside x86 processors has an instruction F2XM1 that computes 2x-1 on [-1,1]. This can be used for second step of the computation of exp() and exp2(). There is an instruction FSCALE which is used to multiply by2i in the third step. A common way of implementing F2XM1 itself is as microcode that utilizes a rational or polynomial approximation. Note that the x87 FPU is maintained mostly for legacy support these days. On modern x86 platform libraries typically use pure software implementations based on SSE and algorithms similar to the one shown below. Some combine small tables with polynomial approximations.

pow(x,y) can be conceptually implemented as exp(y*log(x)), but this suffers from significant loss of accuracy when x is near unity and y in large in magnitude, as well as incorrect handling of numerous special cases specified in the C/C++ standards. One way to get around the accuracy issue is to compute log(x) and the product y*log(x)) in some form of extended precision. The details would fill an entire, lengthy separate answer, and I do not have code handy to demonstrate it. In various C/C++ math libraries, pow(double,int) and powf(float, int) are computed by a separate code path that applies the "square-and-multiply" method with bit-wise scanning of the the binary representation of the integer exponent.

#include <math.h> /* import fmaf(), ldexpf(), INFINITY */

/* Like rintf(), but -0.0f -> +0.0f, and |a| must be < 2**22 */
float quick_and_dirty_rintf (float a)
{
    const float cvt_magic = 0x1.800000p+23f;
    return (a + cvt_magic) - cvt_magic;
}

/* Approximate exp(a) on the interval [log(sqrt(0.5)), log(sqrt(2.0))]. */
float expf_poly (float a)
{ 
    float r;

    r =             0x1.694000p-10f;  // 1.37805939e-3
    r = fmaf (r, a, 0x1.125edcp-07f); // 8.37312452e-3
    r = fmaf (r, a, 0x1.555b5ap-05f); // 4.16695364e-2
    r = fmaf (r, a, 0x1.555450p-03f); // 1.66664720e-1
    r = fmaf (r, a, 0x1.fffff6p-02f); // 4.99999851e-1
    r = fmaf (r, a, 0x1.000000p+00f); // 1.00000000e+0
    r = fmaf (r, a, 0x1.000000p+00f); // 1.00000000e+0
    return r;
}

/* Approximate exp2() on interval [-0.5,+0.5] */
float exp2f_poly (float a)
{ 
    float r;

    r =             0x1.418000p-13f;  // 1.53303146e-4
    r = fmaf (r, a, 0x1.5efa94p-10f); // 1.33887795e-3
    r = fmaf (r, a, 0x1.3b2c6cp-07f); // 9.61833261e-3
    r = fmaf (r, a, 0x1.c6af8ep-05f); // 5.55036329e-2
    r = fmaf (r, a, 0x1.ebfbe0p-03f); // 2.40226507e-1
    r = fmaf (r, a, 0x1.62e430p-01f); // 6.93147182e-1
    r = fmaf (r, a, 0x1.000000p+00f); // 1.00000000e+0
    return r;
}

/* Approximate exp10(a) on [log(sqrt(0.5))/log(10), log(sqrt(2.0))/log(10)] */
float exp10f_poly (float a)
{ 
    float r;

    r =             0x1.a56000p-3f;  // 0.20574951
    r = fmaf (r, a, 0x1.155aa8p-1f); // 0.54170728
    r = fmaf (r, a, 0x1.2bda96p+0f); // 1.17130411
    r = fmaf (r, a, 0x1.046facp+1f); // 2.03465796
    r = fmaf (r, a, 0x1.53524ap+1f); // 2.65094876
    r = fmaf (r, a, 0x1.26bb1cp+1f); // 2.30258512
    r = fmaf (r, a, 0x1.000000p+0f); // 1.00000000
    return r;
}

/* Compute exponential base e. Maximum ulp error = 0.86565 */
float my_expf (float a)
{
    float t, r;
    int i;

    t = a * 0x1.715476p+0f;            // 1/log(2); 1.442695
    t = quick_and_dirty_rintf (t);
    i = (int)t;
    r = fmaf (t, -0x1.62e400p-01f, a); // log_2_hi; -6.93145752e-1
    r = fmaf (t, -0x1.7f7d1cp-20f, r); // log_2_lo; -1.42860677e-6
    t = expf_poly (r);
    r = ldexpf (t, i);
    if (a < -105.0f) r = 0.0f;
    if (a >  105.0f) r = INFINITY;     // +INF
    return r;
}

/* Compute exponential base 2. Maximum ulp error = 0.86770 */
float my_exp2f (float a)
{
    float t, r;
    int i;

    t = quick_and_dirty_rintf (a);
    i = (int)t;
    r = a - t;
    t = exp2f_poly (r);
    r = ldexpf (t, i);
    if (a < -152.0f) r = 0.0f;
    if (a >  152.0f) r = INFINITY;     // +INF
    return r;
}

/* Compute exponential base 10. Maximum ulp error = 0.95588 */
float my_exp10f (float a)
{
    float r, t;
    int i;

    t = a * 0x1.a934f0p+1f;            // log2(10); 3.321928
    t = quick_and_dirty_rintf (t);
    i = (int)t;
    r = fmaf (t, -0x1.344140p-2f, a);  // log10(2)_hi // -3.01030159e-1
    r = fmaf (t, 0x1.5ec10cp-23f, r);  // log10(2)_lo //  1.63332601e-7
    t = exp10f_poly (r);
    r = ldexpf (t, i);
    if (a < -46.0f) r = 0.0f;
    if (a >  46.0f) r = INFINITY;      // +INF
    return r;
}

#include <string.h>
#include <stdint.h>

uint32_t float_as_uint32 (float a)
{
    uint32_t r;
    memcpy (&r, &a, sizeof r);
    return r;
}

float uint32_as_float (uint32_t a)
{
    float r;
    memcpy (&r, &a, sizeof r);
    return r;
}

uint64_t double_as_uint64 (double a)
{
    uint64_t r;
    memcpy (&r, &a, sizeof r);
    return r;
}

double floatUlpErr (float res, double ref)
{
    uint64_t i, j, err, refi;
    int expoRef;
    
    /* ulp error cannot be computed if either operand is NaN, infinity, zero */
    if (isnan (res) || isnan (ref) || isinf (res) || isinf (ref) ||
        (res == 0.0f) || (ref == 0.0f)) {
        return 0.0;
    }
    /* Convert the float result to an "extended float". This is like a float
       with 56 instead of 24 effective mantissa bits.
    */
    i = ((uint64_t)float_as_uint32(res)) << 32;
    /* Convert the double reference to an "extended float". If the reference is
       >= 2^129, we need to clamp to the maximum "extended float". If reference
       is < 2^-126, we need to denormalize because of the float types's limited
       exponent range.
    */
    refi = double_as_uint64(ref);
    expoRef = (int)(((refi >> 52) & 0x7ff) - 1023);
    if (expoRef >= 129) {
        j = 0x7fffffffffffffffULL;
    } else if (expoRef < -126) {
        j = ((refi << 11) | 0x8000000000000000ULL) >> 8;
        j = j >> (-(expoRef + 126));
    } else {
        j = ((refi << 11) & 0x7fffffffffffffffULL) >> 8;
        j = j | ((uint64_t)(expoRef + 127) << 55);
    }
    j = j | (refi & 0x8000000000000000ULL);
    err = (i < j) ? (j - i) : (i - j);
    return err / 4294967296.0;
}

#include <stdio.h>
#include <stdlib.h>

int main (void)
{
    double ref, ulp, maxulp;
    float arg, res, reff;
    uint32_t argi, resi, refi, diff, sumdiff;

    printf ("testing expf ...\n");
    argi = 0;
    sumdiff = 0;
    maxulp = 0;
    do {
        arg = uint32_as_float (argi);
        res = my_expf (arg);
        ref = exp ((double)arg);
        ulp = floatUlpErr (res, ref);
        if (ulp > maxulp) maxulp = ulp;
        reff = (float)ref;
        refi = float_as_uint32 (reff);
        resi = float_as_uint32 (res);
        diff = (resi < refi) ? (refi - resi) : (resi - refi);
        if (diff > 1) {
            printf ("!! expf: arg=%08x res=%08x ref=%08x\n", argi, resi, refi);
            return EXIT_FAILURE;
        } else {
            sumdiff += diff;
        }
        argi++;
    } while (argi);
    printf ("expf   maxulp=%.5f  sumdiff=%u\n", maxulp, sumdiff);
    
    printf ("testing exp2f ...\n");
    argi = 0;
    maxulp = 0;
    sumdiff = 0;
    do {
        arg = uint32_as_float (argi);
        res = my_exp2f (arg);
        ref = exp2 ((double)arg);
        ulp = floatUlpErr (res, ref);
        if (ulp > maxulp) maxulp = ulp;
        reff = (float)ref;
        refi = float_as_uint32 (reff);
        resi = float_as_uint32 (res);
        diff = (resi < refi) ? (refi - resi) : (resi - refi);
        if (diff > 1) {
            printf ("!! expf: arg=%08x res=%08x ref=%08x\n", argi, resi, refi);
            return EXIT_FAILURE;
        } else {
            sumdiff += diff;
        }
        argi++;
    } while (argi);
    printf ("exp2f  maxulp=%.5f  sumdiff=%u\n", maxulp, sumdiff);

    printf ("testing exp10f ...\n");
    argi = 0;
    maxulp = 0;
    sumdiff = 0;
    do {
        arg = uint32_as_float (argi);
        res = my_exp10f (arg);
        ref = exp10 ((double)arg);
        ulp = floatUlpErr (res, ref);
        if (ulp > maxulp) maxulp = ulp;
        reff = (float)ref;
        refi = float_as_uint32 (reff);
        resi = float_as_uint32 (res);
        diff = (resi < refi) ? (refi - resi) : (resi - refi);
        if (diff > 1) {
            printf ("!! expf: arg=%08x res=%08x ref=%08x\n", argi, resi, refi);
            return EXIT_FAILURE;
        } else {
            sumdiff += diff;
        }
        argi++;
    } while (argi);
    printf ("exp10f maxulp=%.5f  sumdiff=%u\n", maxulp, sumdiff);
    

    return EXIT_SUCCESS;
}
like image 108
njuffa Avatar answered Oct 01 '22 07:10

njuffa