Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the dangers of using R attributes?

Tags:

r

Adding used-defined attributes to R objects makes it easy to carry around some additional information glued together with the object of interest. The problem is that it slightly changes how R sees the objects, e.g. a numeric vector with additional attribute still is numeric but is not a vector anymore:

x <- rnorm(100)
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] TRUE
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"
attr(x, "foo") <- "this is my attribute"
class(x)
## [1] "numeric"
is.numeric(x)
## [1] TRUE
is.vector(x)
## [1] FALSE  # <-- here!
mode(x)
## [1] "numeric"
typeof(x)
## [1] "double"

Can this lead to any potential problems? What I'm thinking about is adding some attributes to common R objects and then passing them to other methods. What is the risk of something breaking just because of the fact alone that I added additional attributes to standard R objects (e.g. vector, matrix, data.frame etc.)?

Notice that I'm not asking about creating my own classes. For the sake of simplicity we can also assume that there won't be any conflicts in the names of the attributes (e.g. using dims attribute). Let's also assume that it is not a problem if some method at some point will drop my attribute, it is an acceptable risk.

like image 759
Tim Avatar asked Apr 12 '17 12:04

Tim


1 Answers

In my (somewhat limited) experience, adding new attributes to an object hasn't ever broken anything. The only likely scenario I can think of where it would break something would be if a function required that an object have a specific set of attributes and nothing else. I can't think of a time when I've encountered that though. Most functions, especially in S3 methods, will just ignore attributes they don't need.

You're more likely to see problems arise if you remove attributes.

The reason you won't see a lot of problems stemming from additional attributes is that methods are dispatched on the class of an object. As long as the class doesn't change, methods will be dispatched in much the same way. However, this doesn't mean that existing methods will know what to do with your new attributes. Take the following example--after adding a new_attr attribute to both x and y, and then adding them, the result adopts the attribute of x. What happened to the attribute of y? The default + function doesn't know what to do with conflicting attributes of the same name, so it just takes the first one (more details at R Language Definition, thanks Brodie).

x <- 1:10
y <- 10:1

attr(x, "new_attr") <- "yippy"
attr(y, "new_attr") <- "ki yay"

x + y

[1]  1  2  3  4  5  6  7  8  9 10
attr(,"new_attr")
[1] "yippy"

In a different example, if we give x and y attributes with different names, x + y produces an object that preserves both attributes.

x <- 1:10
y <- 10:1

attr(x, "new_attr") <- "yippy"
attr(y, "another_attr") <- "ki yay"

x + y
 [1] 11 11 11 11 11 11 11 11 11 11
attr(,"another_attr")
[1] "ki yay"
attr(,"new_attr")
[1] "yippy"

On the other hand, mean(x) doesn't even try to preserve the attributes. I don't know of a good way to predict which functions will and won't preserve attributes. There's probably some reliable mnemonic you could use in base R (aggregation vs. vectorized, perhaps?), but I think there's a separate principle that ought to be considered.

If preservation of your new attributes is important, you should define a new class that preserves the inheritance of the old class

With a new class, you can write methods that extend the generics and handle the attributes in whichever way you want. Whether or not you should define a new class and write its methods is very much dependent on how valuable any new attributes you add are to the future work you will be doing.

So in general, adding new attributes is very unlikely to break anything in R. But without adding a new class and methods to handle the new attributes, I would be very cautious about interpreting the meaning of those attributes after they've been passed through other functions.

like image 55
Benjamin Avatar answered Nov 15 '22 22:11

Benjamin