Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reducing Precision in Doubles in R

I'm looking for a way to consistently ignore small differences between floating point numbers in R (these are double precision floating points as per IEC 60559), by using base R tools and without resorting to C or C++. In other words, I would like to "round" the significand portion of the double precision floating point numbers such that things like this return TRUE instead of FALSE:

1.45 - .55 == 2.45 - 1.55
## [1] FALSE

Something like:

round_significand(1.45 - .55, bits=48) == round_significand(2.45 - 1.55, bits=48)
## [1] TRUE

A simple round doesn't work because the level to which we need to round depends on the magnitude of the number.

data.table does something of the sort internally, from ?setNumericRounding:

Computers cannot represent some floating point numbers (such as 0.6) precisely, using base 2. This leads to unexpected behaviour when joining or grouping columns of type 'numeric'; i.e. 'double', see example below. In cases where this is undesirable, data.table allows rounding such data up to approximately 11 s.f. which is plenty of digits for many cases. This is achieved by rounding the last 2 bytes off the significand. Other possible values are 1 byte rounding, or no rounding (full precision, default).

I'm working on a hack implementation that scales everything to be a decimal number x such that floor(log10(x)) == 1 and rounds that, e.g.:

rnd_sig <- function(x, precision=10) {
  exp <- floor(log10(abs(x)))
  round(x * 10 ^ (-exp), precision) / 10 ^ (-exp)
}

but I don't know enough about floating point numbers to be sure this is safe (or when it is safe, and not).

like image 595
BrodieG Avatar asked Oct 15 '25 20:10

BrodieG


1 Answers

There is no general answer for how much a result computed with floating-point may differ from the exact mathematical result. In general, the final error of a sequence of computations can range from zero to infinity (and may also produce Not-a-Number results when there is an exactly mathematical result or may produce a numeric result when there is no defined mathematical result). Therefore, determining what tolerance to use to classify whether two computed results are equal or not requires a problem-specific solution: One must analyze the calculations and numbers involved in the specific problem and determine bounds on the possible error or weigh the specific advantages and disadvantages of accepting incorrect classifications.

The study of errors that result in numerical computation is numerical analysis. It is a broad field addressed by many books. No simple answer exists.

In simple situations, it may be possible to determine bounds on the errors and to show that these bounds are less than the differences between results that are known to be different. In other words, given a computation that ideally would produce results a and b but actually produces a and b, it might be possible to show that there is some bound E on the error such that |ab| < E if and only if a equals b. However, it is not possible to answer this question without knowing what computations are performed and, possibly, knowing what the domain of input values is.

like image 104
Eric Postpischil Avatar answered Oct 17 '25 11:10

Eric Postpischil