Having a problem. I need to convert an irregular list of list to a data.frame in wide format (i.e. I need same number of rows) and I just can't figure out how to do it. List looks something like this:
[[1]]
[1] 14
[[2]]
[1] 26
[[3]]
[1] 20 21 22 23
[[4]]
[1] 21 22
[[5]]
[1] 25
[[6]]
[1] 17 21 23
I've tried various approaches using for loops and/or sapply's but nothing works. The list elements being different lengths scuppers any attempt I've made. It occurs to me there must be a fairly straightforward way to do this. Mustn't there ? Can anyone advise ?
Here's a lapply
/ mapply
example...
# Data
set.seed(1)
ll <- replicate( 4 , runif( sample(4,1) ) )
str(ll)
#List of 4
# $ : num [1:2] 0.372 0.573
# $ : num [1:4] 0.202 0.898 0.945 0.661
# $ : num [1:3] 0.0618 0.206 0.1766
# $ : num [1:3] 0.384 0.77 0.498
# Find length of each list element
len <- sapply(ll,length)
# Longest gives number of rows
n <- max( len )
# Number of NAs to fill for column shorter than longest
len <- n - len
# Output
mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len )
# [,1] [,2] [,3] [,4]
#[1,] 0.3721239 0.2016819 0.06178627 0.3841037
#[2,] 0.5728534 0.8983897 0.20597457 0.7698414
#[3,] NA 0.9446753 0.17655675 0.4976992
#[4,] NA 0.6607978 NA NA
Note, output is a matrix, so you need to wrap the output with data.frame()
.
data.frame( t( mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len ) ) )
# X1 X2 X3 X4
#1 0.37212390 0.5728534 NA NA
#2 0.20168193 0.8983897 0.9446753 0.6607978
#3 0.06178627 0.2059746 0.1765568 NA
#4 0.38410372 0.7698414 0.4976992 NA
One straightforward approach would be to get the data into a "long" form first (for example, using "melt"), adding a "times" variable, and then using dcast
or reshape
to get the data back into a new "wide" form.
These examples use ll
from @Simon's answer:
Here's a "reshape2" approach:
library(reshape2)
ll2 <- melt(ll)
ll2$time <- ave(ll2$L1, ll2$L1, FUN = seq_along)
dcast(ll2, L1 ~ time, value.var="value")
# L1 1 2 3 4
# 1 1 0.37212390 0.5728534 NA NA
# 2 2 0.20168193 0.8983897 0.9446753 0.6607978
# 3 3 0.06178627 0.2059746 0.1765568 NA
# 4 4 0.38410372 0.7698414 0.4976992 NA
## Or, for the other orientation:
dcast(ll2, time ~ L1, value.var="value")
You can also use the "data.table" package for this, if you are using at least version 1.8.11 of the package
library(data.table)
library(reshape2)
packageVersion("data.table") ## Need at least V 1.8.11
# [1] ‘1.8.11’
DT <- data.table(ll)
DTL <- DT[, unlist(ll), by = 1:nrow(DT)]
DTL[, time := sequence(.N), by = nrow]
dcast.data.table(DTL, nrow ~ time, value.var="V1")
# nrow 1 2 3 4
# 1: 1 0.37212390 0.5728534 NA NA
# 2: 2 0.20168193 0.8983897 0.9446753 0.6607978
# 3: 3 0.06178627 0.2059746 0.1765568 NA
# 4: 4 0.38410372 0.7698414 0.4976992 NA
## Or, for the other orientation
dcast.data.table(DTL, time ~ nrow, value.var="V1")
Both of these have the added advantages of conveniently replacing NA
with anything else you wish to use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With