Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does item assignment in non-existant data.frame column work?

Tags:

r

Inspired by Q6437164: can someone explain to me why the following works:

iriscopy<-iris #or whatever other data.frame
iriscopy$someNonExistantColumn[1]<-15

To me, it seems not obvious how R interprets this statement as: create a new column with name someNonExistantColumn in the data.frame, and set the first value (in fact, all values, as it seems) to the value 15.

like image 933
Nick Sabbe Avatar asked Jun 22 '11 09:06

Nick Sabbe


2 Answers

The R language definition manual gives us a pointer to how R evaluates expressions of the form:

x$foo[1] <- 15

namely it is as if we have called

`*tmp*` <- x
x <- "$<-.data.frame"(`*tmp*`, name = "foo", 
                      value = "[<-.data.frame"("$.data.frame"(`*tmp*`, "foo"), 
                                               1, value = 15))
rm(`*tmp*`)

the middle bit might be easier to grapple with if we drop, for purposes of exposition, the actual methods used:

x <- "$<-"(`*tmp*`, name = "foo", 
           value = "[<-"("$"(`*tmp*`, "foo"), 1, value = 15))

To go back to your example using iris, we have something like

iris$foo[1] <- 15

Here, the functions are evaluated recursively. First the extractor function "$" is used to access component "foo" from iris, which is NULL:

> "$"(iris, "foo")
NULL

Then, "[<-" is used to replace the first element of the object returned above (the NULL) with the value 15, i.e. a call of:

> "[<-"(NULL, 1, value = 15)
[1] 15

Now, this is the object that is used as argument value in the outermost part of our call, namely the assignment using "$<-":

> head("$<-"(iris, "foo", value = 15))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species foo
1          5.1         3.5          1.4         0.2  setosa  15
2          4.9         3.0          1.4         0.2  setosa  15
3          4.7         3.2          1.3         0.2  setosa  15
4          4.6         3.1          1.5         0.2  setosa  15
5          5.0         3.6          1.4         0.2  setosa  15
6          5.4         3.9          1.7         0.4  setosa  15

(here wrapped in head() to limit the number of rows shown.)

That hopefully explains how the function calls progress. The last issue to deal with is why the entire vector foo is set to 15? The answer to that is given in the Details section of ?"$<-.data.frame":

Details:

....

         Note that there is no ‘data.frame’ method for ‘$’, so ‘x$name’
     uses the default method which treats ‘x’ as a list.  There is a
     replacement method which checks ‘value’ for the correct number of
     rows, and replicates it if necessary.

The key bit is the last sentence. In the above example, the outermost assignment used value = 15. But at this point, we are wanting to replace the entire component "foo", which is of length nrow(iris). Hence, what is actually used is value = rep(15, nrow(iris)), in the outermost assignment/function call.

This example is all the more complex because you have to convert from the convenience notation of

x$foo[1] <- 15

into proper function calls using "$<-"(), "[<-"(), and "$"(). The example in Section 3.4.4 of The R Language Definition uses this simpler example:

names(x)[3] <- "Three"

which evaluates to

`*tmp*` <- x
x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three"))
rm(`*tmp*`)

which is slightly easier to get your head around because names() looks like a usual function call.

like image 106
Gavin Simpson Avatar answered Oct 29 '22 11:10

Gavin Simpson


I think the answer is that it doesn't work.

I consider the $newcol to be standard behaviour to create a new column. For example:

iris$newcol <- 1

will create a new column in the iris data.frame. All values will be 1, because of vector recycling.

This creation of a new column gets triggered when the expression evaluates to NULL. From ?$<-:

  • "When $<- is applied to a NULL x, it first coerces x to list(). This is what also happens with [[<- if the replacement value value is of length greater than one: if value has length 1 or 0, x is first coerced to a zero-length vector of the type of value."

So I think what happens here is that the expression evaluates to NULL, and this triggers the code to create a new column, which in turn uses vector recycling to fill the values.

Edit

The parsing probably works using $-assign $<- rather than bracket-assign [<-. Compare:

head(`$<-`(iris, newcol, 1))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species newcol
1          5.1         3.5          1.4         0.2  setosa      1
2          4.9         3.0          1.4         0.2  setosa      1
3          4.7         3.2          1.3         0.2  setosa      1
4          4.6         3.1          1.5         0.2  setosa      1
5          5.0         3.6          1.4         0.2  setosa      1
6          5.4         3.9          1.7         0.4  setosa      1

But bracket assign produces an error:

head(`[<-`(iris, newcol, 1))
Error in head(`[<-`(iris, newcol, 1)) : 
  error in evaluating the argument 'x' in selecting a method for function 'head': Error in is.atomic(value) : 'value' is missing
like image 33
Andrie Avatar answered Oct 29 '22 10:10

Andrie