Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is integer overflow in R and how can it happen?

I have some calculation going on and get the following warning (i.e. not an error):

Warning messages: 1: In sum(myvar, na.rm = T) : Integer overflow - use sum(as.numeric(.)) 

In this thread people state that integer overflows simply don't happen. Either R isn't overly modern or they are not right. However, what am I supposed to do here? If I use as.numeric as the warning suggests I might not account for the fact that information is lost way before. myvar is read form a .csv file, so shouldn't R figure out that some bigger field is needed? Does it already cut off something?

What's the max length of integer or numeric? Would you suggest any other field type / mode?

EDIT: I run:

R version 2.13.2 (2011-09-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) within R Studio

like image 519
Matt Bannert Avatar asked Jan 10 '12 14:01

Matt Bannert


People also ask

What is integer overflow in R?

It means that you are taking the mean [in your case, the sum -- @aix] of some very large integers, and the calculation is overflowing. It is just a warning. This will not happen in the next release of R.

How does integer overflow happen?

An integer overflow occurs when you attempt to store inside an integer variable a value that is larger than the maximum value the variable can hold. The C standard defines this situation as undefined behavior (meaning that anything might happen).

What happens when an overflow occurs?

Overflow occurs when: Two negative numbers are added and an answer comes positive or. Two positive numbers are added and an answer comes as negative.

How do you solve integer overflow problems?

In languages where integer overflow can occur, you can reduce its likelihood by using larger integer types, like Java's long or C's long long int. If you need to store something even bigger, there are libraries built to handle arbitrarily large numbers.


1 Answers

You can answer many of your questions by reading the help page ?integer. It says:

R uses 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9.

Expanding to larger integers is under consideration by R Core but it's not going to happen in the near future.

If you want a "bignum" capacity then install Martin Maechler's Rmpfr package [PDF]. I recommend the 'Rmpfr' package because of its author's reputation. Martin Maechler is also heavily involved with the Matrix package development, and in R Core as well. There are alternatives, including arithmetic packages such as 'gmp', 'Brobdingnag' and 'Ryacas' package (the latter also offers a symbolic math interface).

Next, to respond to the critical comments in the answer you linked to, and how to assess the relevance to your work, consider this: If there were the same statistical functionality available in one of those "modern" languages as there is in R, you would probably see a user migration in that direction. But I would say that migration, and certainly growth, is in the R direction at the moment. R was built by statisticians for statistics.

There was at one time a Lisp variant with a statistics package, Xlisp-Stat, but its main developer and proponent is now a member of R-Core. On the other hand one of the earliest R developers, Ross Ihaka, suggests working toward development in a Lisp-like language [PDF]. There is a compiled language called Clojure (pronounced as English speakers would say "closure") with an experimental interface, Rincanter.

Update:

The new versions of R (3.0.+) has 53 bit integers of a sort (using the numeric mantissa). When an "integer" vector element is assigned a value in excess of '.Machine$integer.max', the entire vector is coerced to "numeric", a.k.a. "double". Maximum value for integers remains as it was, however, there may be coercion of integer vectors to doubles to preserve accuracy in cases that would formerly generate overflow. Unfortunately, the length of lists, matrix and array dimensions, and vectors is still set at integer.max.

When reading in large values from files, it is probably safer to use character-class as the target and then manipulate. If there is coercion to NA values, there will be a warning.

like image 123
IRTFM Avatar answered Sep 28 '22 05:09

IRTFM