Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to 'unlist' a column in a data.table

Tags:

r

data.table

in my table, some cells are vectors instead of single value, i.e. the column is a list instead of vector:

dt1 <- data.table(
  colA=   c('A1','A2','A3'), 
  colB=list('B1',c('B2a','B2b'),'B3'),
  colC=   c('C1','C2','C3'), 
  colD=   c('D1','D2','D3')
)

dt1
#   colA    colB colC colD
#1:   A1      B1   C1   D1
#2:   A2 B2a,B2b   C2   D2
#3:   A3      B3   C3   D3 

I need to reshape it to a long format unlisting that column colB. So far I do it like this:

dt1[,.(colB=unlist(colB)),by=.(colA,colC,colD)]
#   colA colC colD colB
#1:   A1   C1   D1   B1
#2:   A2   C2   D2  B2a
#3:   A2   C2   D2  B2b
#4:   A3   C3   D3   B3

it does the job but I don't like that I have to indicate all other column names explicitly in by=. Is there better way to do this?
(I'm sure it's already answered elsewhere but I couldn't find it so far)

P.S. ideally I would like to manage without any external packages

like image 991
Vasily A Avatar asked Jun 02 '17 20:06

Vasily A


People also ask

What does unlist in R?

The unlist R function converts a list to a single vector.

How do I sort a data table in R?

To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.


2 Answers

Promoting my comment to an answer. Using:

dt1[,.(colB = unlist(colB)), by = setdiff(names(dt1), 'colB')]

gives:

   colA colC colD colB
1:   A1   C1   D1   B1
2:   A2   C2   D2  B2a
3:   A2   C2   D2  B2b
4:   A3   C3   D3   B3

Or as an alternative (a slight variation of @Frank's proposal):

dt1[rep(dt1[,.I], lengths(colB))][, colB := unlist(dt1$colB)][]
like image 33
Jaap Avatar answered Oct 09 '22 00:10

Jaap


I think @Jaap's is easiest, but here's another alternative to chew over:

#create ID column
dt1[ , ID := .I]

#unnest colB, keep ID column
dt_unnest = dt1[ , .(ID = rep(ID, lengths(colB)),
                     colB = unlist(colB))]
#merge
dt_unnest = dt_unnest[dt1[ , !'colB'], on = 'ID']
#    ID colB colA colC colD
# 1:  1   B1   A1   C1   D1
# 2:  2  B2a   A2   C2   D2
# 3:  2  B2b   A2   C2   D2
# 4:  3   B3   A3   C3   D3
like image 186
MichaelChirico Avatar answered Oct 09 '22 01:10

MichaelChirico