Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should integers be explicitly specified?

Tags:

integer

r

I often write R code where I test the length of a vector, the number of rows in a data frame, or the dimensions of a matrix, for example if (length(myVector) == 1). While poking around in some base R code, I noticed that in such comparisons values are explicitly stated as integers, usually using the 'L' suffix, for example if (nrow(data.frame) == 5L). Explicit integers are also sometimes used for function arguments, for example these statements from the cor function: x <- matrix(x, ncol = 1L) and apply(u, 2L, rank, na.last = "keep"). When should integers be explicitly specified in R? Are there any potentially negative consequences from not specifying integers?

like image 540
pistachionut Avatar asked Nov 26 '12 03:11

pistachionut


2 Answers

You asked:

Are there any potentially negative consequences from not specifying integers?

There are situations where it is likely to matter more. From Chambers Software for Data Analysis p193:

Integer values will be represented exactly as "double" numbers so long as the absolute value of the integer is less than 2^m, the length of the fractional part of the representation (2^54 for 32-bit machines).

It's not hard to see how if you calculated a value it might look like an integer but not quite be one:

> (seq(-.45,.45,.15)*100)[3]
[1] -15
> (seq(-.45,.45,.15)*100)[3] == -15L
[1] FALSE

However, it's harder to come up with an example of explicitly typing in an integer and having it come up not quite an integer in the floating point representation, until you get into the larger values Chambers describes.

like image 156
Ari B. Friedman Avatar answered Sep 24 '22 00:09

Ari B. Friedman


Using 1L etc is programmatically safe, as in it is explicit as to what is meant, and does not rely on any conversions etc.

When writing code interactively, it can be easy to notice errors and fix along the way, however if you are writing a package (even base R), it will be safer to be explicit.

When you are considering equality, using floating point numbers will cause precision issues See this FAQ.

Explicitly specifying integers avoids this, as nrow and length, and the index arguments to apply return or require integers.

like image 2
mnel Avatar answered Sep 24 '22 00:09

mnel