Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing factor levels on a column with setattr is sensitive for how the column was created

Tags:

r

data.table

I want to change factor levels of a column using setattr. However, when the column is selected the standard data.table way (dt[ , col]), the levels are not updated. On the other hand, when selecting the column in an unorthodox way in a data.table setting—namely using $—it works.

library(data.table)

# Some data 
d <- data.table(x = factor(c("b", "a", "a", "b")), y = 1:4)
d
#    x y
# 1: b 1
# 2: a 2
# 3: a 3
# 4: b 4

# We want to change levels of 'x' using setattr
# New desired levels
lev <- c("a_new", "b_new")

# Select column in the standard data.table way 
setattr(x = d[ , x], name = "levels", value = lev)

# Levels are not updated
d
#    x y
# 1: b 1
# 2: a 2
# 3: a 3
# 4: b 4

# Select column in a non-standard data.table way using $
setattr(x = d$x, name = "levels", value = lev)

# Levels are updated
d
#        x y
# 1: b_new 1
# 2: a_new 2
# 3: a_new 3
# 4: b_new 4

# Just check if d[ , x] really is the same as d$x
d <- data.table(x = factor(c("b", "a", "a", "b")), y = 1:4)
identical(d[ , x], d$x)
# [1] TRUE
# Yes, it seems so

It feels like I'm missing some data.table (R?) basics here. Can anyone explain what's going on?


I have found two other post on setattr and levels:

setattr on levels preserving unwanted duplicates (R data.table)

How does one change the levels of a factor column in a data.table

Both of them used $ to select the column. Neither of them mentioned the [ , col] way.

like image 231
Henrik Avatar asked Jan 08 '17 23:01

Henrik


1 Answers

It might help to understand if you look at the address from both expressions:

address(d$x)
# [1] "0x10e4ac4d8"
address(d$x)
# [1] "0x10e4ac4d8"


address(d[,x])
# [1] "0x105e0b520"
address(d[,x])
# [1] "0x105e0a600"

Note that the address from the first expression doesn't change when you call it multiple times, while the second expression does which indicates it is making a copy of the column due to the dynamic nature of the address, so setattr on it will have no effect on the original data.table.

like image 178
Psidom Avatar answered Nov 02 '22 14:11

Psidom