Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding code for custom in-place modification function?

Tags:

r

data.table

I came across this post: http://r.789695.n4.nabble.com/speeding-up-perception-tp3640920p3646694.html from Matt Dowle, discussing some early? implementation ideas of the data.table package.

He uses the following code:

x = list(a = 1:10000, b = 1:10000) 
class(x) = "newclass" 
"[<-.newclass" = function(x,i,j,value) x      # i.e. do nothing 
tracemem(x)
x[1, 2] = 42L 

Specifically I am looking at:

"[<-.newclass" = function(x,i,j,value) x

I am trying to understand what is done there and how i could use this notation.

It looks to me like:

  • i is the row index
  • j is column index
  • value is the value to be assigned
  • x is the object under consideration

My best guess would therefore be that i define a custom function for in place modification (for a given class).

[<-.newclass is in class modification for class newclass.

Understanding what happens: Usually the following code should return an error:

x = list(a = 1:10000, b = 1:10000) 
x[1, 2] = 42L 

so i guess the sample code does not have any practical use.

Attempt to use the logic:

A simple non-sense try would be to square the value to be inserted:

x[i, j] <- value^2

Full try:

> x = matrix(1:9, 3, 3)
> class(x) = "newclass"
> "[<-.newclass" = function(x, i, j, value) x[i, j] <- value^2 # i.e. do something
> x[1, 2] = 9
Error: C stack usage  19923536 is too close to the limit

This doesnt seem to work.

My question(s):

"[<-.newclass" = function(x,i,j,value) x 

How exactly does this notation work and how would I use it?

(I add data.table tag since the linked discussion is about the "by-reference" in place modification in data.table, i think).

like image 851
Tonio Liebrand Avatar asked Aug 09 '18 14:08

Tonio Liebrand


Video Answer


1 Answers

The `[<-`() function is (traditionally) used for subassignment, and is, more broadly, a type of replacement function. It is also generic (more specifically, an internal generic), which allows you to write custom methods for it, as you correctly surmised.

Replacement functions

In general, when you call a replacement function, such as ...

foo(x) <- bar(y)

... the expression on the right hand side of <- (so here bar(y)) gets passed as a named value argument to `foo<-`() with x as the first argument, and the object x is reassigned with the result: that is, the said call is equivalent to writing:

x <- `foo<-`(x, value = bar(y))

So in order to work at all, all replacement functions must take at least two arguments, one of which must be named value. Most replacement functions only have these two arguments, but there are also exceptions: such as `attr<-` and, typically, subassignment.

Subassignment

When you have a subassignment call like x[i, j] <- y, i and j get passed as additional arguments to the `[<-`() function with x and y as the first and value arguments, respectively:

x <- `[<-`(x, i, j, value = y) # x[i, j] <- y

In the case of a matrix or a data.frame, i and j would be used for selecting rows and columns; but in general, this does not need to be the case. A method for a custom class could do anything with the arguments. Consider this example:

x <- matrix(1:9, 3, 3)
class(x) <- "newclass" 

`[<-.newclass` <- function(x, y, z, value) {
  x + (y - z) * value # absolute nonsense
}

x[1, 2] <- 9
x
#>      [,1] [,2] [,3]
#> [1,]   -8   -5   -2
#> [2,]   -7   -4   -1
#> [3,]   -6   -3    0
#> attr(,"class")
#> [1] "newclass"

Is this useful or reasonable? Probably not. But is it valid R code? Absolutely!

It's less common to see custom subassignment methods in real applications, as `[<-`() usually "just works" as you might expect it to, based on the underlying object of your class. A notable exception is `[<-.data.frame`, where the underlying object is a list, but subassignment behaves matrix-like. (On the other hand, many classes do need a custom subsetting method, as the default `[`() method drops most attributes, including the class attribute, see ?`[` for details).


As to why your example doesn't work: remember that you are writing a method for a generic function, and all the regular rules apply. If we use the functional form of `[<-`() and expand the method dispatch in your example, we can see immediately why it fails:

`[<-.newclass` <- function(x, i, j, value) {
  x <- `[<-.newclass`(x, i, j, value = value^2)  # x[i, j] <- value^2
}

That is, the function was defined recursively, without a base case, resulting in an infinite loop. One way to get around this would be to unclass(x) before calling the next method:

`[<-.newclass` <- function(x, i, j, value) {
  x <- unclass(x)
  x[i, j] <- value^2
  x # typically you would also add the class back here
}

(Or, using a somewhat more advanced technique, the body could also be replaced with an explicit next method like this: NextMethod(value = value^2). This plays nicer with inheritance and superclasses.)

And just to verify that it works:

x <- matrix(1:9, 3, 3)
class(x) <- "newclass" 

x[1, 2] <- 9
x
#>      [,1] [,2] [,3]
#> [1,]    1   81    7
#> [2,]    2    5    8
#> [3,]    3    6    9

Perfectly confusing!


As for the context of Dowle's "do nothing" subassignment example, I believe this was to illustrate that back in R 2.13.0, a custom subassignment method would always cause a deep copy of the object to be made, even if the method itself did nothing at all. (This is no longer the case, since R 3.1.0 I believe.)

Created on 2018-08-15 by the reprex package (v0.2.0).

like image 184
Mikko Marttila Avatar answered Oct 06 '22 21:10

Mikko Marttila