in my table, some cells are vectors instead of single value, i.e. the column is a list instead of vector:
dt1 <- data.table(
colA= c('A1','A2','A3'),
colB=list('B1',c('B2a','B2b'),'B3'),
colC= c('C1','C2','C3'),
colD= c('D1','D2','D3')
)
dt1
# colA colB colC colD
#1: A1 B1 C1 D1
#2: A2 B2a,B2b C2 D2
#3: A3 B3 C3 D3
I need to reshape it to a long format unlisting that column colB
. So far I do it like this:
dt1[,.(colB=unlist(colB)),by=.(colA,colC,colD)]
# colA colC colD colB
#1: A1 C1 D1 B1
#2: A2 C2 D2 B2a
#3: A2 C2 D2 B2b
#4: A3 C3 D3 B3
it does the job but I don't like that I have to indicate all other column names explicitly in by=
. Is there better way to do this?
(I'm sure it's already answered elsewhere but I couldn't find it so far)
P.S. ideally I would like to manage without any external packages
The unlist R function converts a list to a single vector.
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.
Promoting my comment to an answer. Using:
dt1[,.(colB = unlist(colB)), by = setdiff(names(dt1), 'colB')]
gives:
colA colC colD colB 1: A1 C1 D1 B1 2: A2 C2 D2 B2a 3: A2 C2 D2 B2b 4: A3 C3 D3 B3
Or as an alternative (a slight variation of @Frank's proposal):
dt1[rep(dt1[,.I], lengths(colB))][, colB := unlist(dt1$colB)][]
I think @Jaap's is easiest, but here's another alternative to chew over:
#create ID column
dt1[ , ID := .I]
#unnest colB, keep ID column
dt_unnest = dt1[ , .(ID = rep(ID, lengths(colB)),
colB = unlist(colB))]
#merge
dt_unnest = dt_unnest[dt1[ , !'colB'], on = 'ID']
# ID colB colA colC colD
# 1: 1 B1 A1 C1 D1
# 2: 2 B2a A2 C2 D2
# 3: 2 B2b A2 C2 D2
# 4: 3 B3 A3 C3 D3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With