Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

attr(*, "internal.selfref")=<externalptr> appearing in data.table Rstudio

I am a new user of the R data.table package, and I have noticed something unusual in my data.tables that I have not found explained in the documentation or elsewhere on this site.

When using data.table package within Rstudio, and viewing a specific data.table within the 'Environment' panel, I see the following string appearing at the end of the data.table

attr(*,"internal.selref")=<externalptr>

If I print the same data.table within the Console, this string does not appear.

Is this a bug, or just an inherent feature of data.table (or Rstudio)? Should I be concerned about whether this is affecting how these data are handled by downstream processes?

The versions I am running are as follows:
data.table Version 1.9.6 Rstudio Version 0.99.447 OSX 10.10.5

Apologies in advance if this is just me being an ignorant newbie.

like image 572
SubstantiaN Avatar asked Oct 20 '15 15:10

SubstantiaN


2 Answers

I actually asked Matt Dowle, the primary author of the data.table package, this very question a little while ago.

Is this a bug, or just an inherent feature of data.table (or Rstudio)?

Apparently this attribute is used internally by data.table, it isn't a bug in RStudio, in fact RStudio is doing its job of showing the attributes of the object.

Should I be concerned about whether this is affecting how these data are handled by downstream processes?

No, this isn't going to affect anything.

like image 81
Daniel Benjamin Joplin Avatar answered Oct 20 '22 14:10

Daniel Benjamin Joplin


For those who are curious about why this attribute is created, I believe it's explained in the data.table manual under the section for setkey():

In v1.7.8, the key<- syntax was deprecated. The <- method copies the whole table and we know of no way to avoid that copy without a change in R itself. Please use the set* functions instead, which make no copy at all. setkey accepts unquoted column names for convenience, whilst setkeyv accepts one vector of column names. The problem (for data.table) with the copy by key<- (other than being slower) is that R doesn’t maintain the over allocated truelength, but it looks as though it has. Adding a column by reference using := after a key<- was therefore a memory overwrite and eventually a segfault; the over allocated memory wasn’t really there after key<-’s copy. data.tables now have an attribute .internal.selfref to catch and warn about such copies. This attribute has been implemented in a way that is friendly with identical() and object.size(). For the same reason, please use the other set* functions which modify objects by reference, rather than using the <- operator which results in copying the entire object.

like image 36
asafr Avatar answered Oct 20 '22 14:10

asafr