Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between as.integer() and +0L used on booleans?

Tags:

r

I saw +0L used in an answer to a question and found out that it works well with matrices / data frames / data tables where as.integer() would be unable to preserve the initial data classes.

> a <- matrix(TRUE, nrow=3, ncol=3)
> a
     [,1] [,2] [,3]
[1,] TRUE TRUE TRUE
[2,] TRUE TRUE TRUE
[3,] TRUE TRUE TRUE
> as.integer(a)
[1] 1 1 1 1 1 1 1 1 1
> a+0L
     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    1    1    1
[3,]    1    1    1
  • Is there other differences between these approaches?
  • What are the pros and cons and caveats when using one or the other?

[edit:] lots of wisdom in comments! Apparently there is many different ways to achieve the same result, some of which I had no idea about, so:

  • What are the other ways to achieve what a+0L does?
like image 986
LauriK Avatar asked Feb 09 '15 14:02

LauriK


3 Answers

(This answer adds no other alternative to the ones already present, but I'm posting just to tidy up comments in this thread.)

as.integer, by definition, behaves like as.vector, i.e. it strips all attributes ("dim" included) to create an R vector. It won't, just, return the same object with a changed typeof. To restore attributes after the coercion, "dim<-", "names<-", "class<-" etc. need to be called explicitly or via a function that stores attributes of its arguments (e.g. "[<-"). E.g. "dim<-"(as.integer(a), dim(a)) or array(as.integer(a), dim(a)) or a[] <- as.integer(a). A benchmark:

x = matrix(T, 1e3, 1e3)
microbenchmark::microbenchmark("dim<-"(as.integer(x), dim(x)),
                               array(as.integer(x), dim(x)), 
                               { x[] = as.integer(x) }, times = 25)
#Unit: milliseconds
#                           expr      min       lq   median        uq      max neval
# `dim<-`(as.integer(x), dim(x)) 1.650232 1.691296 2.492748  4.237985  5.67872    25
#   array(as.integer(x), dim(x)) 6.226130 6.638513 8.526779  8.973268 47.50351    25
#    {     x[] = as.integer(x) } 7.822421 8.071243 9.658487 10.408435 11.90798    25

In the above, "dim<-" justs adds an attribute to the created as.integer(x), array allocates a new vector to store the created as.integer(x), and "[<-" changes "x" so that it can accept the values of the created as.integer(x) and, then, iterates through "x" to insert its new values.

The "[<-" method, though, has a disadvantage:

x = as.character(1:5)
x
#[1] "1" "2" "3" "4" "5"
x[] = as.integer(x)
x
#[1] "1" "2" "3" "4" "5"

Or:

x = 1:5
x
#[1] 1 2 3 4 5
x[] = as.logical(x)
x
#[1] 1 1 1 1 1

But:

x = round(runif(5), 2)
x
#[1] 0.68 0.54 0.02 0.14 0.08
x[] = as.character(x)
x
#[1] "0.68" "0.54" "0.02" "0.14" "0.08"

I.e. "[<-" won't change the typeof of the replaceable object if the typeof of replacement object is higher. Subassignment (i.e. "[<-") coerces either the object to be replaced or the replacing object or none depending on their typeofs (this is done by SubassignTypeFix). @Josh O'Brien notes the possibility for a difference to exist in the behaviour of "[<-" if the indices are missing. To be honest, I could not find a specific treatment in such case, as in, for example do_subset_dflt ("[") that indirectly handles missingness.

As already mentioned, there is, also, "storage.mode<-" to change the typeof of an object:

"storage.mode<-"(as.character(1:5), "integer")
#[1] 1 2 3 4 5
"storage.mode<-"(1:5, "logical")
#[1] TRUE TRUE TRUE TRUE TRUE
"storage.mode<-"(round(runif(5), 2), "character")
#[1] "0.09" "0.38" "0.98" "0.73" "0.81"

x = matrix(T, 1e3, 1e3)
microbenchmark::microbenchmark("storage.mode<-"(x, "integer"), 
                               "dim<-"(as.integer(x), dim(x)), times = 25)
#Unit: milliseconds
#                           expr      min      lq   median       uq      max neval
# `storage.mode<-`(x, "integer") 1.986055 2.01842 2.147181 2.406096 6.019415    25
# `dim<-`(as.integer(x), dim(x)) 1.984664 2.02016 2.111684 2.613854 6.174973    25

Similar in efficiency to "dim<-" since they both coerce once and store an attribute.

Binary operations (as mentioned by James and Konrad Rudolph) coerce their arguments to suitable typeof and keep attributes ("dim", "names", "class" etc.) depending on rules regarding the two arguments. (Section "Value" in ?Arithmetic)

like image 56
alexis_laz Avatar answered Nov 15 '22 07:11

alexis_laz


x + 0L is an element wise operation on x; as such, it often preserves the shape of the data. as.integer isn’t: it takes the whole structure – here, a matrix – and converts it into a one-dimensional integer vector.

That said, in the general case I’d strongly suggest using as.integer and discourage + 0L as a clever hack (remember: often, clever ≠ good). If you want to preserve the shape of data I suggest using David’s method from the comments, rather than the + 0L hack:

a[] = as.integer(a)

This uses the normal meaning of as.integer, but the result is assigned to the individual elements of a, rather than a itself. In other words, a’s shape remains untouched.

like image 30
Konrad Rudolph Avatar answered Nov 15 '22 07:11

Konrad Rudolph


Adding 0L promotes a to integer as described in ?Arithmetic:

Logical vectors will be coerced to integer or numeric vectors, FALSE having value zero and TRUE having value one.

As a consequence any arithmetic operation using a and the identity element for that operation (but doesn't have to go to numeric at some point, eg / and ^) will work:

a+0L
a-0L
a*1L
a%/%1

Unary operations will also work, so perhaps the "best" code golf version is:

--a

This has a parallel with the common trick of using !!a to convert a numeric object to logical.

identical(a+0L, a-0L, a*1L, a%/%1L, --a)
[1] TRUE

Converting back to logical:

identical(a, !!--a)
[1] TRUE

An alternative, and perhaps clearer, approach is to change the storage.mode of a directly:

storage.mode(a) <- "integer"
a
     [,1] [,2] [,3]
[1,]    1    1    1
[2,]    1    1    1
[3,]    1    1    1
like image 12
James Avatar answered Nov 15 '22 05:11

James