Converting an arbitrary-precision rational number (OCaml, zarith) to an approximate floating number

Question

I'm using the Zarith library to do arbitrary-precision rational arithmetic. Suppose I have a rational number q of type Q.t that is the ratio of two big integers (Q is Zarith's arbitrary-precision rational number module). Sometimes, I would like to print this number as a floating point number for readability, and sometimes I need to convert this number floating-point for later non-arbitrary-precision calculations. Is there a way to convert q to a floating-point number up to a certain level of precision?

The way I am converting q to floating-point now has no guarantees, and can create undefined floating-point numbers (Z is the arbitrary-precision integer module):

let to_float q =
  let n, d = num q, den q in
  (* check if d is zero and raise an error if it is *)
  let nf, df = Z.to_float n, Z.to_float d in
  nf /. df

Is there a better way to handle this, where I can obtain a floating-point that most accurately approximates any q?

Edit

I quickly wrote up Mark Dickinson's answer in OCaml, if anyone is interested. It can probably (definitely) be improved and cleaned up. I'll edit if I do so or if anyone has any advice for improvements. But for now this has solved my problem!

let to_float q = 
  let n, d = num q, den q in
  let n_sign = Z.sign n in
  let d_sign = Z.sign d in (* always >= 0 *)
  if d_sign = 0 then raise Division_by_zero;
  let n = Z.abs n in
  if n_sign = 0 then 0. else
    let shift = (Z.numbits n) - (Z.numbits d) - 55 in
    let is_subnormal = shift < -1076 in
    let shift = if is_subnormal then -1076 else shift in
    let d = if shift >= 0 then Z.shift_left d shift else d in
    let n = if shift < 0 then Z.shift_left n (-shift)
      else n in
    let quotient, remainder = Z.div_rem n d in
    let quotient = if (Z.compare remainder (Z.zero)) = 0 && Z.is_even quotient then
        Z.add Z.one quotient else quotient in
    let quotient = if not is_subnormal then quotient else
        let round_select = Z.to_int @@ Z.rem quotient @@ Z.of_int 8 in
        Z.add quotient [|Z.zero;Z.minus_one;Z.of_int (-2);Z.one;Z.zero
                        ;Z.minus_one;Z.of_int 2;Z.one|].(round_select)
    in
    let unsigned_res = ldexp (Z.to_float quotient) shift in                                                                                                             
    if n_sign = 1 then unsigned_res else -.unsigned_res

I'll look into writing an interface for GMP's mpq_get_d function later, but I'm not entirely sure how to do that. The only way I see how to do that is to convert q : Q.t to a string and pass that to:

int mpq_set_str (mpq_t rop, const char *str, int base)

Does anyone know how to then pass rop to mpq_get_d in OCaml or have a reference that describes how to do this? I looked through chapter 19 of RWO and didn't see a situation like this.

Mark Dickinson · Accepted Answer

If you have access to

an integer log2 operation, and
the ability to left shift an integer by a given number of bits

then it's relatively easy to roll your own correctly rounded conversion. In a nutshell, the method looks like this:

Reduce to the case n > 0, d > 0; filter out obvious underflow/overflow
Choose an integer shift so that 2^-shift*n/d lies between 2^54 and 2^56.
Use integer arithmetic to compute x = 2^-shift*n/d, rounded to the nearest integer using the round-to-odd rounding method.
Convert x to the nearest IEEE 754 double-precision value dx, with the usual round-ties-to-even rounding mode.
Return ldexp(dx, shift).

I'm afraid I'm not fluent in OCaml, but the following Python code illustrates the idea for positive inputs. I leave it to you to make the obvious modifications for negative inputs and for division by zero. You might also want to make an early return for cases of extreme overflow and underflow: those are easy to detect by looking for extra large or small values of shift below.

from math import ldexp

def to_float(numerator, denominator):
    """
    Convert numerator / denominator to float, correctly rounded.

    For simplicity, assume both inputs are positive.
    """
    # Shift satisfies 2**54 < (numerator / denominator) / 2**shift < 2**56
    shift = numerator.bit_length() - denominator.bit_length() - 55

    # Divide the fraction by 2**shift.
    if shift >= 0:
        denominator <<= shift
    else:
        numerator <<= -shift

    # Convert to the nearest integer, using round-to-odd.
    q, r = divmod(numerator, denominator)
    if r != 0 and q % 2 == 0:
        q += 1

    # Now convert to the nearest float and shift back.
    return ldexp(float(q), shift)

Some notes:

the bit_length method on a positive integer n gives the number of bits required to represent n, or in other words 1 + floor(log2(n)).
divmod is a Python function that computes the quotient and remainder of an integer division at the same time.
the quantity q fits (easily) in a 64-bit integer
we're rounding twice: once when converting the shifted numerator / denominator to the nearest integer, and again when rounding that integer to a float. The first round uses the round-to-odd method; this ensures that the second round (implicit in the conversion from int to float) gives the same result as if we'd rounded the fraction directly to a float.
the above algorithm doesn't correctly handle fractions whose converted float value is subnormal: in that case, the ldexp operation introduces potentially a third rounding. It's possible to deal with this, with some care. See below for some code.

The above is in fact a simplified version of the algorithm that Python uses when dividing one (big) integer by another to get a floating-point result. You can see the source here. The comment at the beginning of the long_true_divide function gives an overview of the method.

For completeness, here's a variant that also deals correctly with subnormal results.

def to_float(numerator, denominator):
    """
    Convert numerator / denominator to float, correctly rounded.

    For simplicity, assume both inputs are positive.
    """
    # Choose shift so that 2**54 < numerator / denominator / 2**shift < 2**56
    shift = numerator.bit_length() - denominator.bit_length() - 55

    # The 'treat_as_subnormal' flag catches all cases of subnormal results,
    # along with some cases where the result is not subnormal but *is* still
    # smaller than 2**-1021. In all these cases, it's sufficient to find the
    # closest integer multiple of 2**-1074. We first round to the nearest
    # multiple of 2**-1076 using round-to-odd.
    treat_as_subnormal = shift < -1076
    if treat_as_subnormal:
        shift = -1076

    # Divide the fraction by 2**shift.
    if shift >= 0:
        denominator <<= shift
    else:
        numerator <<= -shift

    # Convert to the nearest integer, using round-to-odd.
    q, r = divmod(numerator, denominator)
    if r != 0 and q % 2 == 0:
        q += 1

    # Now convert to the nearest float and shift back.
    if treat_as_subnormal:
        # Round to the nearest multiple of 4, rounding ties to
        # the nearest multiple of 8. This avoids double rounding
        # from the ldexp call below.
        q += [0, -1, -2, 1, 0, -1, 2, 1][q%8]

    return ldexp(float(q), shift)

Jeffrey Scofield · Answer

This isn't a complete answer, but in looking around I found that Zarith uses GMP internally. There is a GMP function named mpq_get_d that converts a rational to a double. If it's not available directly in Zarith, it should be possible (given some time) to add an interface for it.

Converting an arbitrary-precision rational number (OCaml, zarith) to an approximate floating number

Tags:

floating-point

ocaml

arbitrary-precision

rational-number

Dimitrios

2 Answers

Mark Dickinson

Jeffrey Scofield

Recent Activity

Donate For Us

Converting an arbitrary-precision rational number (OCaml, zarith) to an approximate floating number

Tags:

floating-point

ocaml

arbitrary-precision

rational-number

Dimitrios

2 Answers

Mark Dickinson

Jeffrey Scofield

Related questions

Recent Activity

Donate For Us