Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the name of this syntax trick & where is it documented?

Tags:

syntax

r

Haven't run into this before. From the help page of pairs.panels in package psych, one finds the following:

data(iris)
pairs.panels(iris[1:4],bg=c("red","yellow","blue")[iris$Species],pch=21)

I want to ask about this argument, which sets the background color of the circles drawn for the data points: bg=c("red","yellow","blue")[iris$Species] Clearly, this argument associates the 3 levels of iris$Species, a factor, with the 3 colors given. I'm not asking about what it does.

I am wondering what this way of associating arguments passed with data levels on the fly is called, and where it is documented? It seems like some R magic. If I were writing this function, I would likely pass the colors and the column name of the factor separately and then make the association manually behind the scenes. This trick could be very useful. But on the face of it [iris$Species] looks like the data is indexing itself. You can't type [iris$Species] in the console for instance, it just gives an error. You can type c("red","yellow","blue")[iris$Species] and get the correct answer. Seems like there might be some recycling going on, but I'm not sure. I'd be curious about where this is documented, and if anyone can explain what's happening in a short sentence or two. For instance, is [iris$Species] being converted to integer, then used to index the list of 3 colors? I'm thinking that's it, but I'd like another opinion.

Note: the same trick is used in graphics::pairs on which panels.pairs is based on.

like image 885
Bryan Hanson Avatar asked Jul 23 '13 18:07

Bryan Hanson


1 Answers

There are two things going on here:

  1. The factor iris$Species is being coerced to numeric/integer.
  2. These integer indices are being used in the usual way.

Coercion

This is important because the factor labels are not red/yellow/blue in this case:

> all( c("red","yellow","blue")[iris$Species] == c("red","yellow","blue")[as.integer(iris$Species)] )
[1] TRUE
> all( c("red","yellow","blue")[iris$Species] == c("red","yellow","blue")[as.character(iris$Species)] )
[1] NA

Indexing with repeated elements

In R, whenever you index a simple vector, elements of the index that are repeated are included repeatedly.

> x <- letters[1:5]
> x
[1] "a" "b" "c" "d" "e"
> x[c(1,3)]
[1] "a" "c"
> x[c(1,3,3,3,3)]
[1] "a" "c" "c" "c" "c"

This is commonly exploited when sampling with replacement.

Where is this documented?

In a variety of places, although it's not always emphasized how cool it is.

For instance, on page 11, W. N. Venables, D. M. Smith, and the R Development Core Team. An Introduction to R. Notes on R: A Programming Environment for Data Analysis and Graphics. Version 2.5.0 (2007-04-23). states:

> x[1:10]
selects the first 10 elements of x (assuming length(x) is not less than 10). Also
> c("x","y")[rep(c(1,2,2,1), times=4)]
(an admittedly unlikely thing to do) produces a character vector of length 16 consisting of
"x", "y", "y", "x" repeated four times.
like image 114
Ari B. Friedman Avatar answered Nov 13 '22 09:11

Ari B. Friedman