Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change floating point rounding mode

What is the most efficient way to change the rounding mode* of IEEE 754 floating point numbers? A portable C function would be nice, but a solution that uses x86 assembly is ok too.

*I am referring to the standard rounding modes of towards nearest, towards zero, and towards positive/negative infinity

like image 512
Jeff Linahan Avatar asked Jul 29 '11 01:07

Jeff Linahan


People also ask

How are floating point numbers rounded?

In floating point arithmetic, two extra bits are used to the far right of the significand, called the guard and round bits. At the end of the arithmetic calculation, these bits are rounded off. We always round towards the closer digit (i.e. 0.00-‐0.49 → 0 and 0.51-‐0.99 → 1).

Why is the process of rounding necessary in the IEEE floating point format?

In contrast, given any fixed number of bits, most calculations with real numbers will produce quantities that cannot be exactly represented using that many bits. Therefore the result of a floating-point calculation must often be rounded in order to fit back into its finite representation.

How does python round by default?

Python round() Method By default, the round() method rounds a number to zero decimal places. round() accepts two arguments: The value you want to round. The number of decimal points to which the value should be rounded (optional, default zero decimal places).


1 Answers

This is the standard C solution:

#include <fenv.h>
#pragma STDC FENV_ACCESS ON

// store the original rounding mode
const int originalRounding = fegetround( );
// establish the desired rounding mode
fesetround(FE_TOWARDZERO);
// do whatever you need to do ...

// ... and restore the original mode afterwards
fesetround(originalRounding);

On backwards platforms lacking C99 support, you may need to resort to assembly. In this case, you may want to set the rounding for both the x87 unit (via the fldcw instruction) and SSE (via the ldmxcsr instruction).

Edit You don't need to resort to assembly for MSVC. You can use the (totally non-standard) _controlfp( ) instead:

unsigned int originalRounding = _controlfp(0, 0);
_controlfp(_RC_CHOP, _MCW_RC);
// do something ...
_controlfp(originalRounding, _MCW_RC);

You can read more about _controlfp( ) on MSDN.

And, just for completeness, a decoder ring for the macro names for rounding modes:

rounding mode    C name         MSVC name
-----------------------------------------
to nearest       FE_TONEAREST   _RC_NEAR
toward zero      FE_TOWARDZERO  _RC_CHOP
to +infinity     FE_UPWARD      _RC_UP
to -infinity     FE_DOWNWARD    _RC_DOWN
like image 62
Stephen Canon Avatar answered Oct 17 '22 22:10

Stephen Canon