Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

irregular list of lists to dataframe

Tags:

list

dataframe

r

Having a problem. I need to convert an irregular list of list to a data.frame in wide format (i.e. I need same number of rows) and I just can't figure out how to do it. List looks something like this:

[[1]]
[1] 14

[[2]]
[1] 26

[[3]]
[1] 20 21 22 23

[[4]]
[1] 21 22

[[5]]
[1] 25

[[6]]
[1] 17 21 23

I've tried various approaches using for loops and/or sapply's but nothing works. The list elements being different lengths scuppers any attempt I've made. It occurs to me there must be a fairly straightforward way to do this. Mustn't there ? Can anyone advise ?

like image 705
user2498193 Avatar asked Jan 13 '14 23:01

user2498193


2 Answers

Here's a lapply / mapply example...

#  Data
set.seed(1)
ll <- replicate( 4 , runif( sample(4,1) ) )
str(ll)
#List of 4
# $ : num [1:2] 0.372 0.573
# $ : num [1:4] 0.202 0.898 0.945 0.661
# $ : num [1:3] 0.0618 0.206 0.1766
# $ : num [1:3] 0.384 0.77 0.498

#  Find length of each list element
len <- sapply(ll,length)

#  Longest gives number of rows
n <- max( len )

#  Number of NAs to fill for column shorter than longest
len <- n - len

#  Output
mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len )
#          [,1]      [,2]       [,3]      [,4]
#[1,] 0.3721239 0.2016819 0.06178627 0.3841037
#[2,] 0.5728534 0.8983897 0.20597457 0.7698414
#[3,]        NA 0.9446753 0.17655675 0.4976992
#[4,]        NA 0.6607978         NA        NA

Note, output is a matrix, so you need to wrap the output with data.frame().


Row-wise filling and returning a data.frame

data.frame( t( mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len ) ) )
#          X1        X2        X3        X4
#1 0.37212390 0.5728534        NA        NA
#2 0.20168193 0.8983897 0.9446753 0.6607978
#3 0.06178627 0.2059746 0.1765568        NA
#4 0.38410372 0.7698414 0.4976992        NA
like image 126
Simon O'Hanlon Avatar answered Oct 23 '22 14:10

Simon O'Hanlon


One straightforward approach would be to get the data into a "long" form first (for example, using "melt"), adding a "times" variable, and then using dcast or reshape to get the data back into a new "wide" form.

These examples use ll from @Simon's answer:

Here's a "reshape2" approach:

library(reshape2)
ll2 <- melt(ll)
ll2$time <- ave(ll2$L1, ll2$L1, FUN = seq_along)
dcast(ll2, L1 ~ time, value.var="value")
#   L1          1         2         3         4
# 1  1 0.37212390 0.5728534        NA        NA
# 2  2 0.20168193 0.8983897 0.9446753 0.6607978
# 3  3 0.06178627 0.2059746 0.1765568        NA
# 4  4 0.38410372 0.7698414 0.4976992        NA

## Or, for the other orientation:
dcast(ll2, time ~ L1, value.var="value")

You can also use the "data.table" package for this, if you are using at least version 1.8.11 of the package

library(data.table)
library(reshape2)
packageVersion("data.table") ## Need at least V 1.8.11
# [1] ‘1.8.11’

DT <- data.table(ll)
DTL <- DT[, unlist(ll), by = 1:nrow(DT)]
DTL[, time := sequence(.N), by = nrow]
dcast.data.table(DTL, nrow ~ time, value.var="V1")
#    nrow          1         2         3         4
# 1:    1 0.37212390 0.5728534        NA        NA
# 2:    2 0.20168193 0.8983897 0.9446753 0.6607978
# 3:    3 0.06178627 0.2059746 0.1765568        NA
# 4:    4 0.38410372 0.7698414 0.4976992        NA

## Or, for the other orientation
dcast.data.table(DTL, time ~ nrow, value.var="V1")

Both of these have the added advantages of conveniently replacing NA with anything else you wish to use.

like image 1
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 23 '22 15:10

A5C1D2H2I1M1N2O1R2T1