Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overloading `$` for named vector in R

Tags:

r

When working with named vectors like

vec <- c(a = 1, b = 2)

I often find myself introducing mistakes by writing vec$a when I should be writing vec["a"] or vec[["a"]] to access the corresponding value with and without the name, respectively.

I think vec$a being an error is counter-intuitive, as $ generally extracts named things. This feeling even seems supported e.g. in ?Extract, where the example usage is x$name - isn't that a perfect fit for a named vector?

That got me thinking about the possibility of overloading the $ operator to work on named vectors. However, I am not very experienced with operator overloading in R, and I understand (e.g. from the answers here) that caution is advised when overloading basic operators.

My interconnected questions: Is there a reason why I shouldn't overload $ as described that I'm failing to understand? That is, is there (in some vague sense) a "good" reason why this is not the default in R? If not, how might I sensibly do so?

I understand that in practice this is likely a bad idea, if only for reasons of portability, but I'm still curious.

like image 579
MSR Avatar asked Nov 04 '19 18:11

MSR


1 Answers

There are times when method-overloading is a good thing; with primitive functions, it is feasible with some classes (e.g., data.table:::$<-.data.table and tibble:::$.tbl_df), it is not trivial with base classes. In general, I think attempting this is a bad idea.

  • One idiomatic way to do this would be to provide an S3 method that supports it, such as $.numeric. This would allow you to tightly control use of this method for a specific type of object (in this case, a numeric vector, does not trigger on lists). Unfortunately, because $ is a primitive function, it does not allow overloading on base R object classes such as $.numeric.

  • If you're willing to re-class vectors on which you want to apply this, then this can be done:

    `$.quux` <- function(x, name) x[name]
    vec <- c(a = 1)
    class(vec) <- c("quux", class(vec))
    vec$a
    # a 
    # 1 
    vec$b
    # <NA> 
    #   NA 
    

    Unfortunately, this requires you to re-class any object you want to be able to do this with.

  • Another choice would be to override $ itself:

    `$` <- function(x, name) x[deparse(substitute(name))]
    c(a=1)$a
    # a 
    # 1 
    

    But there are so many problems with this: it affects every normal use of $, including non-vector arguments (try mtcars$mpg and see that it now returns a single-column data.frame instead of the normal behavior of a vector) and assignment (mtcars$mpg <- ... fails). It is certainly possible to try to catch each and every one of these special cases, but invariably you'll either miss some corner-case, some object type, or cause some otherwise assumed behavior to mis-behave, breaking other things.

While I agree that this behavior might seem a little inconsistent, honestly there comes a time when changing this type of behavior has way too many second-order effects than you can band-aid. (One close analogy to this kind of change could be python2 versus python3: that "transition" started in December 2008 with the first release of python-3, and despite python-2's alleged end-of-life in Jan 2020, it's been neither prompt nor smooth.)

like image 118
r2evans Avatar answered Nov 09 '22 02:11

r2evans