I find the logic of data.table
inconsistent for the two below operations:
Operation 1:
df1<-data.table(a=c(1,2))
list1<-list(c(1,2), 1)
df1[,b:=list1]
#> df1
# a b
#1: 1 1,2
#2: 2 1
Operation 2: (data.table treats a singleton list as if I supplied a vector)
df2<-data.table(a=c(1))
list2<-list(c(1,2))
df2[, b:=list2]
#Warning message:
#In `[.data.table`(df2, , `:=`(b, list2)) :
# Supplied 2 items to be assigned to 1 items of column 'b' (1 unused)
#> df2
a b
#1: 1 1
I would like the output in the second case to be:
# a b
#1: 1 1,2
I can do to unify both cases:
df1[, b:=list(list1)]
df2[, b:=list(list2)]
Is this the best solution? Is there no option for data.table to not unnest a singleton list? Are there no extra operations performance-wise in the first case when I use b:=list(list1)
?
Copying my answer from https://stackoverflow.com/a/54797914/2490497
I cannot suggests a duplicate because of "This question does not have an upvoted or accepted answer
".
This is very good question that touches design decision about :=
operator.
For simple calls using :=
as an operator, like col := val
, we decided to wrap val
into a list automatically. This decision was made to make it more convenient for users to assign single column.
When you are using function call form, ":="(col = val)
we are not wrapping val
into list anymore. It is extended form already. :=
behaves as an alias to list
but updating in-place. You can always check what will be the updated column by changing :=
into list
(or .
) like .(col = val)
.
Not that even when using :=
as an operator, you still have to provide RHS as list of you are creating 2+ columns, c("col1","col2") := list(val1, val2)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With