Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unlist all list elements in a dataframe

Tags:

dataframe

r

I have a data frame with the following classes of variables for each column:

"date" "numeric" "numeric" "list" "list" "numeric"

The data in each row looks like this:

    1978-01-01, 12.5, 6.3, c(0,0,0.25,0.45,0.3), c(0,0,0,0.1,0.9), 72

I would like to transform it into a matrix or a data frame with one value per column, so the result should look like this:

1978-01-01, 12.5, 6.3, 0, 0, 0.25, 0.45, 0.3, 0, 0, 0, 0.1, 0.9, 72

I have tried using:

j<-unlist(input)
output<-matrix(j,nrow=nrow(input),ncol=length(j)/nrow(input))

But it messes up the order of the rows in the output.

Any idea?

Additional information:

The above example is slightly simplified and dput(head(input)) returns the following sample:

structure(list(DATE = structure(c(2924, 2925, 2926, 2927, 2928, 
2929), class = "Date"), TEMP_MEAN_M0 = c(-7.625, -7.375, -6, 
-5.5, -7.625, -9.625), SLP_MEAN_M0 = c(1012.125, 991.975, 989.825, 
986.675, 988.95, 993.075), WIND_DIR_RF_M0 = structure(list(`2.counts` = c(0, 
0.625, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0, 0, 0, 0.125), `3.counts` = c(0.75, 
0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.375, 
0.125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.125, 0, 0, 0), `5.counts` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 
0, 0, 0.125, 0.375, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `6.counts` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 
0, 0.25, 0.125, 0.25, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `7.counts` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0.125, 0.5, 0.375, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("2.counts", 
"3.counts", "4.counts", "5.counts", "6.counts", "7.counts")), 
    CEIL_HGT_RF_M0 = structure(list(`2.counts` = c(0.625, 0, 
    0, 0, 0, 0, 0, 0, 0, 0.375), `3.counts` = c(0.75, 0.125, 
    0, 0.125, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.25, 0.125, 
    0, 0.125, 0, 0, 0, 0, 0.25, 0.25), `5.counts` = c(0, 0, 0, 
    0, 0, 0, 0, 0, 0.125, 0.875), `6.counts` = c(0, 0, 0, 0, 
    0, 0, 0, 0, 0, 1), `7.counts` = c(0, 0, 0, 0, 0, 0, 0, 0, 
    0, 1)), .Names = c("2.counts", "3.counts", "4.counts", "5.counts", 
    "6.counts", "7.counts")), WIND_SPD_MEAN_M0 = c(12.8125, 18.7375, 
    6.175, 8.175, 10.5375, 16.5375)), .Names = c("DATE", "TEMP_MEAN_M0", 
"SLP_MEAN_M0", "WIND_DIR_RF_M0", "CEIL_HGT_RF_M0", "WIND_SPD_MEAN_M0"
), row.names = c(NA, 6L), class = "data.frame")
like image 399
jatobat Avatar asked Apr 10 '13 16:04

jatobat


1 Answers

This is somewhat messy and probably pretty inefficient, but should help get you started:

Here's some sample data:

mydf <- data.frame(Date = as.Date(c("1978-01-01", "1978-01-02")),
                   V1 = c(10, 10),
                   V2 = c(11, 11))
mydf$V3 <- list(c(1:10),
                c(11:20))
mydf$V4 <- list(c(21:25),
                c(26:30))
mydf
#         Date V1 V2                                     V3                 V4
# 1 1978-01-01 10 11          1, 2, 3, 4, 5, 6, 7, 8, 9, 10 21, 22, 23, 24, 25
# 2 1978-01-02 10 11 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 26, 27, 28, 29, 30

And, a little function that checks to see which columns are lists, and for those columns, rbinds them together, and ultimately cbinds them with the columns that are not lists.

myFun <- function(data) {
  temp1 <- sapply(data, is.list)
  temp2 <- do.call(
    cbind, lapply(data[temp1], function(x) 
      data.frame(do.call(rbind, x), check.names=FALSE)))
  cbind(data[!temp1], temp2)
}

myFun(mydf)
#         Date V1 V2 V3.1 V3.2 V3.3 V3.4 V3.5 V3.6 V3.7 V3.8 V3.9 V3.10 V4.1
# 1 1978-01-01 10 11    1    2    3    4    5    6    7    8    9    10   21
# 2 1978-01-02 10 11   11   12   13   14   15   16   17   18   19    20   26
#   V4.2 V4.3 V4.4 V4.5
# 1   22   23   24   25
# 2   27   28   29   30

This will only work if each "column" list contain vectors of the same length (otherwise base R's rbind will not work).


Update

Revisiting this question half a day later, I see that another user (@user1981275) posted a solution that is more straightforward, but then deleted their answer. Perhaps they deleted because their method converted the dates to integers since, as DWin pointed out, items in matrices must be all the same mode.

Here was their solution:

t(apply(mydf, 1, unlist))
#      Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# [1,] 2922 10 11   1   2   3   4   5   6   7   8   9   10  21  22  23  24  25
# [2,] 2923 10 11  11  12  13  14  15  16  17  18  19   20  26  27  28  29  30

Here's how it can easily be modified to get the desired output. This will definitely be faster than the earlier approach:

cbind(mydf[!sapply(mydf, is.list)], 
      (t(apply(mydf[sapply(mydf, is.list)], 1, unlist))))
#         Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# 1 1978-01-01 10 11   1   2   3   4   5   6   7   8   9   10  21  22  23  24  25
# 2 1978-01-02 10 11  11  12  13  14  15  16  17  18  19   20  26  27  28  29  30

Or, as a user function:

myFun <- function(data) {
  ListCols <- sapply(data, is.list)
  cbind(data[!ListCols], t(apply(data[ListCols], 1, unlist)))
}
myFun(mydf)

Update 2

I've also written a more efficient function called col_flatten that's part of my "SOfun" package.

Install the package using:

source("http://news.mrdwab.com/install_github.R")
install_github("mrdwab/SOfun")

Then, you can do:

library(SOfun)
col_flatten(mydf, names(which(sapply(mydf, is.list))), drop = TRUE)
##          Date V1 V2 V3_1 V3_2 V3_3 V3_4 V3_5 V3_6 V3_7 V3_8 V3_9 V3_10 V4_1 V4_2 V4_3 V4_4 V4_5
## 1: 1978-01-01 10 11    1    2    3    4    5    6    7    8    9    10   21   22   23   24   25
## 2: 1978-01-02 10 11   11   12   13   14   15   16   17   18   19    20   26   27   28   29   30

It's based on the transpose function in "data.table", so be sure you have "data.table" as well.

like image 109
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 25 '22 14:10

A5C1D2H2I1M1N2O1R2T1