Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I specify the rounding mode for floating point numbers?

I'd like to round floating point numbers to the nearest integer, going towards positive infinity when there is a tie for "nearest integer".

use std::num::Float;

fn main() {
    assert_eq!(-0.0, (-0.5).round()); // fails!
}

However, the docs for round say:

Round half-way cases away from 0.0.

I haven't seen anything that would allow me to change the rounding mode, but there's got to be some way, right?

like image 532
Shepmaster Avatar asked Jan 10 '23 01:01

Shepmaster


2 Answers

It appears that the implementation of Float::round, at least for f32 and f64, forward to the roundf32/roundf64 instrinsics, which themselves are implemented using the LLVM functions llvm.round.f32 and llvm.round.f64. The documentation for llvm.round.* doesn't say anything about how to control the rounding mode, sadly. There doesn't appear to be anything else in the LLVM reference about it, either. The other functions I could find that even mentioned rounding modes either specified one particular rounding mode, or said it was undefined.

I couldn't find any solid information about this. There was a post on the LLVM mailing list from 2011 that talks about x86-specific intrinsics, and a 2013 post to the Native Client issue tracker that appears to talk about a hypothetical intrinsic and how it would be hard to do portably.

Taking a blind stab at it: I'd try writing a little C library that does it and just link to that. It doesn't appear to be directly supported in LLVM.

like image 146
DK. Avatar answered Jan 16 '23 16:01

DK.


I'm afraid I don't know Rust, but I wrote the following for Julia (based on a similar sequence for ties away from zero by Arch Robinson), which you should be able to adapt:

y = floor(x)
ifelse(x==y, y, copysign(floor(2*x-y),x))

A quick explanation of what is going on:

  1. floor finds the nearest integer less than or equal to x.
  2. If y==x, then x is an integer, so no rounding is necessary: note that this captures all cases where the absolute value of x is greater than 253.
  3. floor(2*x-y) will give the desired answer: 2*x is exact, and we don't have to worry about overflow due to step 2. The subtraction will be exact for all cases except -0.25 < x < 0, which will clearly give the right answer anyway.
  4. The copysign is there just to ensure the zero has the correct sign. If you're not bothered by such things, you could leave it off.

Steps 2 & 3 could be replaced by a simple branch:

x-y < 0.5 ? y : y+1.0

but if I recall correctly, avoiding the branch here made the code more vectorisation friendly (which is also the reason for using ifelse instead of an if block).

like image 23
Simon Byrne Avatar answered Jan 16 '23 16:01

Simon Byrne