Proof of impossibility (or counterexample) for tan(x) = infinity, for floating-point values

Question

The POSIX page of the tan function family (tan, tanf, tanl) in C says that:

If the correct value would cause overflow, a range error shall occur and tan(), tanf(), and tanl() shall return ±HUGE_VAL, ±HUGE_VALF, and ±HUGE_VALL, respectively, with the same sign as the correct value of the function.

However, in practice it is very difficult to actually obtain infinity/-infinity in such cases, since the floating-point number would need to be sufficiently close to π/2 such that its tangent would be larger than the maximum representable floating-point value of that type.

Empirically, I am unable to obtain such results with a standard glibc, even using the nextafter functions to obtain the closest possible values to π/2 (using atan(1) * 2 as "half π", and proceeding from there, either to the left or the right of it).

Exhaustive testing of all (32-bit) floats confirms this is true for 32 bits, at least in my library. Testing the vicinity of π/2 for doubles and long doubles suggests this is also the case for them. However, exhaustive testing is a bit too long: I'd need to test the vicinity of (2k+1)•π/2, for ∀k ∈ ℤ.

So, is there some mathematical or practical argument that allows one to conclude that, at least with "reasonably correct" library implementations (e.g., with some measured ULP bounds as is done for the GNU C library math functions), the precision of floating-point values will always be such that tan(x) will fit in a finite representation of these values? In other words, that tan does not grow faster towards infinity than we can get closer to π/2?

Note that I'm excluding tan(NAN) and tan(INFINITY) from the discussion, since these are documented corner cases which return NaN. Also, it might be possible to obtain different results with subnormal numbers, but since they only occur near zero and not near π/2, I believe we can exclude them.

Therefore, I'm looking for either some mathematical argument/proof/exhaustive test that shows this does not happen, or simply a counterexample where it does, with any standard C library implementation, such that including <math.h> and calling tan does it; but excluding specific libraries with non-standard tan-like functions.

Mark Dickinson · Accepted Answer

Here's a practical demonstration that a correctly-rounded tan(x) is never infinite for any finite IEEE 754 binary64 double x; in fact, it can never exceed 2.2e+18. The same approach would work for a 64-bit precision long double type, but the search would take a little longer. It should even be feasible for the IEEE 754 binary128 and binary256 floating-point formats. Going beyond that, heuristic arguments suggest that it's extremely unlikely that tan(x) could overflow for any IEEE 754 binary floating-point format, but I'm not aware of any proof, and I suspect no known proof exists.

Clearly it's enough to consider only positive finite floats (here and below, "float" means IEEE 754 binary64 floating-point value). Every such float has the form m * 2^e for some integers m and e with 0 < m < 2^53 and -1074 <= e <= 971.

So we want to find integers n, m and e minimising the absolute value |n π/2 - m 2^e|, subject to constraints 0 < n, 0 < m < 2^53 and -1074 <= e <= 971. We're actually only interested in odd values of n, but the optimal n will already be odd: if it's even, then we can halve n and decrement e to halve the absolute value. Moreover, it's clear that there's little point examining e smaller than -53, so we can further restrict to -53 <= e <= 971.

For any particular fixed value of e, our problem is equivalent to finding positive integers m and n that minimise the absolute value |n - m 2^(e+1)/π|, subject to the constraint m < 2^53. If we can solve this problem efficiently, we can iterate over the 1025 distinct values of e to pull out the overall best solution.

But the subproblem for fixed exponent e can be solved by finding the continued-fraction expansion of 2^(e+1) / π: standard number-theoretic results state that the minimum of |n - m 2^(e+1) / π| is guaranteed to come from either a convergent or a semi-convergent n / m to 2^(e+1) / π. All we have to do is find the best convergent or semi-convergent whose denominator is strictly smaller than 2^53. In practice, the continued fraction expansion allows us to find both the best convergent or semi-convergent (subject to the constraint the the denominator is less than 2^53) that's strictly greater than |2^(e+1) / π|, and separately the best convergent or semi-convergent (with similarly constrained denominator) that's strictly smaller than |2^(e+1) / π|, and then a comparison is needed to determine which of the two convergents is closer.

Here's some Python code that implements the above, and ends up finding the same best approximation already mentioned in the comments. First, a handful of imports that we'll need later on:

from fractions import Fraction
from math import floor, ldexp

We'll need an approximation for π/2. In fact, we'll need a pair of approximations: a lower and an upper bound. We'll use the first 1000 digits of π, since they're well documented and easy to find online (though it turns out that in practice we actually only need around 330 digits). With those digits, we create Fraction instances HALF_PI_LOWER and HALF_PI_UPPER representing tight lower and upper bounds for π/2:

# Approximation to π, given by the first 1000 digits after the point. (source:
# https://mathshistory.st-andrews.ac.uk/HistTopics/1000_places/)
PI_APPROX = Fraction("".join("""
3.14159265358979323846264338327950288419716939937510
  58209749445923078164062862089986280348253421170679
  82148086513282306647093844609550582231725359408128
  48111745028410270193852110555964462294895493038196
  44288109756659334461284756482337867831652712019091
  45648566923460348610454326648213393607260249141273
  72458700660631558817488152092096282925409171536436
  78925903600113305305488204665213841469519415116094
  33057270365759591953092186117381932611793105118548
  07446237996274956735188575272489122793818301194912
  98336733624406566430860213949463952247371907021798
  60943702770539217176293176752384674818467669405132
  00056812714526356082778577134275778960917363717872
  14684409012249534301465495853710507922796892589235
  42019956112129021960864034418159813629774771309960
  51870721134999999837297804995105973173281609631859
  50244594553469083026425223082533446850352619311881
  71010003137838752886587533208381420617177669147303
  59825349042875546873115956286388235378759375195778
  18577805321712268066130019278766111959092164201989
""".split()))

# True value of π is within ERROR of the value given above.
ERROR = Fraction("1e-1000")

HALF_PI_LOWER = (PI_APPROX - ERROR) / 2
HALF_PI_UPPER = (PI_APPROX + ERROR) / 2

Now for the continued fraction machinery. First, we give a generator function that generates the sequence of continued fraction coefficients for a value when provided with lower and upper bounds for that value. Essentially this computes the coefficients for the lower and upper bounds separately, and while those coefficients match they're yielded to the caller. As soon as they don't match we don't have enough information to determine the next coefficient, and the function will raise an exception at that point (either RuntimeError or ZeroDivisionError). We don't much care which exception is raised: we're relying on our original approximations to be accurate enough that we never hit that exception.

def continued_fraction_coefficients(lower: Fraction, upper: Fraction):
    """
    Generate continued fraction coefficients for a positive number.

    Given rational lower and upper bounds for a postive rational or irrational
    number, generate the known coefficients of the continued fraction
    representation of that number. The tighter the lower and upper bounds, the
    more coefficients will be known.

    Raises RuntimeError or ZeroDivisionError when there's insufficient
    information to determine the next coefficient.
    """
    while (q := floor(lower)) == floor(upper):
        yield q
        lower, upper = 1/(upper - q), 1/(lower - q)
    raise RuntimeError("Insufficient resolution")

Next, here's a function that consumes this sequence of continued fraction coefficients to produce best lower and best upper approximations (for a given limit on the denominator) to the corresponding value. It generates successive convergents from the continued fraction sequence, and after the denominator limit is exceeded it backtracks to find the best bounds that fit within the denominator limit.

def best_bounds(coefficients, max_denominator):
    """
    Find best lower and upper bounds under a given denominator bound.

    *coefficients* is the sequence of continued fraction coefficients
    for the value we're approximating.

    *max_denominator* is the limit on the denominator.

    Returns the best lower approximation and the best upper approximation
    to the value being approximated, for the given limit on the denominator.
    """
    # a/b and c/d are the closest convergents so far
    # sign is 1 if a/b < value < c/d, and -1 otherwise.

    a, b, c, d, sign = 0, 1, 1, 0, 1
    for q in coefficients:
        a, b, c, d, sign = c, d, a + q * c, b + q * d, -sign
        if max_denominator < d:
            correction = (max_denominator - d) // b
            c, d = c + correction * a, d + correction * b
            break

    if sign == 1:
        return Fraction(a, b), Fraction(c, d)
    else:
        return Fraction(c, d), Fraction(a, b)

Now we write a dedicated function that for a fixed exponent e minimises |n π/2 - m 2^e| over all n and m with m < 2^53. If returns two triples (x, n, error) where x is the floating-point value m 2^e, represented as a Python float, and error gives the error in the approximation, also converted to a float. (So the error values aren't perfectly exact, but they're close enough to allow us to find the best overall approximation.) The two triples represent the best lower and upper bounds.

def best_for_given_exponent(e):
    """
    Find best lower and upper approximations for 2^e / (π/2)
    with denominator smaller than 2**53.

    Returns two triples of the form (x, n, error), where:

    - x is a float close to an integer multiple of π/2
    - n is the coefficient of that integer multiple of π/2
    - error is the absolute error |x - n π/2|, rounded to the nearest float.

    The first triple gives a best lower approximation for a multiple of π/2,
    while the second triple gives a best upper approximation.
    """
    twopow = Fraction(2)**e
    coefficients = continued_fraction_coefficients(twopow/HALF_PI_UPPER, twopow/HALF_PI_LOWER)
    best_lower, best_upper = best_bounds(coefficients, max_denominator=2**53 - 1)

    n, d = best_lower.as_integer_ratio()
    lower_result = ldexp(d, e), n, float(d * twopow - n * HALF_PI_LOWER)

    n, d = best_upper.as_integer_ratio()
    upper_result = ldexp(d, e), n, float(n * HALF_PI_UPPER - d * twopow)

    return lower_result, upper_result

Finally, we put this all together to find the overall best binary64 approximation to an odd multiple of π/2.

all_results = [
    result for e in range(-53, 972)
    for result in best_for_given_exponent(e)
]
x, n, error = min(all_results, key=lambda result: result[2])

print(f"Best approximation: {x.hex()} is an approximation to {n}*π/2 with approximate error {error}")

When I run this on my machine, it gives the following output (after around 5 seconds of computation):

Best approximation: 0x1.6ac5b262ca1ffp+849 is an approximation to 3386417804515981120643892082331156599120239393299838035242121518428537554064774221620930267583474709602068045686026362989271814411863708499869721322715946622634302011697632972907922558892710830616034038541342154669787134871905353772776431251615694251273653*π/2 with approximate error 4.687165924254628e-19

So the overall worst case is the floating-point number 0x1.6ac5b262ca1ffp+849, which matches the value 6381956970095103•2^797 quoted by Eric Postpischil in a comment on the original question:

>>> float.fromhex('0x1.6ac5b262ca1ffp+849') == ldexp(6381956970095103, 797)
True

The tangent of that value will be roughly ±1/4.687165924254628e-19, or 2.133485385753704e+18, and indeed that's the value that the math library on my machine produces:

>>> from math import tan
>>> tan(float.fromhex('0x1.6ac5b262ca1ffp+849'))
-2.133485385753704e+18

Thus no finite binary64 float has a tangent exceeding 2.133485385753704e+18.

chux - Reinstate Monica · Answer

... is there some mathematical or practical argument that allows one to conclude that, ... the precision of floating-point values will always be such that tan(x) will fit in a finite representation of these values?

π is an irrational number, as is some_odd_integer*π/2 - the poles of math tangent(some_radian_measure).

All finite floating point values are rational numbers. Thus no double is an integer multiple of π/2 and tan(some_finite_double) is never a pole.

some_odd_n*π/2 may be very near a double a like 6381956970095103 ∗ 2⁷⁹⁷. When this is true, then tan(a) is large.

This is close to K.C. Ng's "ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit" Chapter 2.3. This roughly implies a closest case difference for about 2^-62 off of some_odd_n*π/2 for all double. Such a value a off of a some_odd_n*π/2 by 2^-62 results in a tan() of about 2⁶², well within double range of 2¹⁰²³.

This key point is that for some FP type to have tan(a) exceed FP_MAX, a must be very close to some_odd_n*π/2, on the order of log2(FP_MAX). That is not very likely to happen with wider FP types as precision is linear and the range exponential with bit width. It may happen with contrived FP types narrower than binary16.

Note: tan() first step is argument reduction to the primary range [-π/2...π/2]. This, IMO, is harder to do well than primary tan() evaluation. Thus the quality (or lack) of a tan() implementation may yield dramatically larger (and incorrect) results with values near some_odd_n*π/2.

I'm looking for either some mathematical argument/proof/exhaustive test that shows this does not happen,

See "using the continued fraction related to π" (ref 6) in linked above article for ideas about how the worst case was determined.

Proof of impossibility (or counterexample) for tan(x) = infinity, for floating-point values

Tags:

c

floating-point

trigonometry

anol

2 Answers

Mark Dickinson

chux - Reinstate Monica

Recent Activity

Donate For Us

Proof of impossibility (or counterexample) for tan(x) = infinity, for floating-point values

Tags:

c

floating-point

trigonometry

anol

2 Answers

Mark Dickinson

chux - Reinstate Monica

Related questions

Recent Activity

Donate For Us