In R source code, most (but not all) functions use integer values for constants:
colnames <- function(x, do.NULL = TRUE, prefix = "col")
{
if(is.data.frame(x) && do.NULL)
return(names(x))
dn <- dimnames(x)
if(!is.null(dn[[2L]]))
dn[[2L]]
else {
nc <- NCOL(x)
if(do.NULL) NULL
else if(nc > 0L) paste0(prefix, seq_len(nc))
else character()
}
}
R Language Definition says:
In most cases, the difference between an integer and a numeric value will be unimportant as R will do the right thing when using the numbers. There are, however, times when we would like to explicitly create an integer value for a constant.
The question is about good practice and the rationale, not about e.g. the "L" notation itself, the difference between integer class and numeric class, or comparing numbers.
If the data consists of only numbers, like decimals, whole numbers, then we call it NUMERIC DATA. In numeric data, the numbers can be positive or negative. If the data consists only of whole numbers, it is called as INTEGER.
Because R's integers are 32-bit long integers and "L" therefore appears to be sensible shorthand for referring to this data type.
as. integer() function in R Language is used to convert a character object to integer object. Syntax: as.integer(x)
From the Constants Section of the R Language Definition: We can use the 'L' suffix to qualify any number with the intent of making it an explicit integer. So '0x10L' creates the integer value 16 from the hexadecimal representation.
As you can see "integer" is a subset of "numeric". > .Machine$integer.max [1] 2147483647 > .Machine$double.xmax [1] 1.797693e+308 Integers only go to a little more than 2 billion, while the other numerics can be much bigger. They can be bigger because they are stored as double precision floating point numbers.
R will automatically convert between the numeric classes when needed, so for the most part it does not matter to the casual user whether the number 3 is currently stored as an integer or as a double. Most math is done using double precision, so that is often the default storage.
However, doubles can express much larger values that integers (in most languages) because they rely on floating point storage, where you have a value and an exponent. Integers are discretely stored. So, if you need to store huge values, a double may be the better choice.
Integers are discretely stored. So, if you need to store huge values, a double may be the better choice. Thus, the type of data you're handling and the types of operations you want to do on that data determine whether double or int is the better choice.
These are some of the use cases in which I explicitly use the L
suffix in declaring the constants. Of course these are not strictly "canonical" (or the only ones), but maybe you can have an idea of the rationale behind. I added, for each case, a "necessary" flag; you will see that these arise only if you interface other languages (like C).
Instead of using a classic as.integer
, I use adding 0L
to a logical vector to make it integer. Of course you could just use 0
, but this would require more memory (typically 8 bytes instead of four) and a conversion.
Say for instance that you want to find to retrieve the elements of the vector after a NA
. You could:
which(is.na(vec)) + 1L
Since which
returns an integer
, adding 1L
will preserve the type and avoid an implicit conversion. Nothing will happen if you omit the L
, since it's just a small optimization. This happens also with match
for instance: if you want to post-process the result of such a function, it's good habit to preserve the type if possible.
From ?integer
:
Integer vectors exist so that data can be passed to C or Fortran code which expects them, and so that (small) integer data can be represented exactly and compactly.
C is much stricter regarding data types. This implies that, if you pass a vector to a C function, you can not rely on C to do the conversions. Say that you want to replace the elements after a NA with some value, say 42. You find the positions of the NA values at the R level (as we did before with which
) and then pass the original vector and the vector of indices to C. The C function will look like:
SEXP replaceAfterNA (SEXP X, SEXP IND) {
...
int *ind = INTEGER(IND);
...
for (i=0; i<l; i++) {
//make here the replacement
}
}
and the from the R side:
...
ind <- which(is.na(x)) + 1L
.Call("replaceAfterNA", x, ind)
...
If you omit the L
in the first line of above, you will receive an error like:
INTEGER() cannot be applied to double vectors
since C is expecting an integer type.
Same as before. If you use the rJava
package and want R to call your own custom Java classes and methods, you have to be sure that an integer is passed when the Java method requires an integer. Not adding a specific example here, but it should be clear why you may want to use the L
suffix in constants in these cases.
Addendum
The previous cases where about when you may want to use L
. Even if I guess much less common, it might be useful to add a case in which you don't want the L
. This may arise if there is danger of integer overflow. The *
, +
and -
operators preserve the type if both the operand are integer. For example:
#this overflows
31381938L*3231L
#[1] NA
#Warning message:
#In 31381938L * 3231L : NAs produced by integer overflow
#this not
31381938L*3231
#[1] 1.01395e+11
So, if you are doing operations on an integer variable which might produce overflow, it's important to cast it to double
to avoid any risk. Adding/subtracting to that variable a constant without the L
might be a good occasion as any to make the cast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With