I have a data frame with the following classes of variables for each column:
"date" "numeric" "numeric" "list" "list" "numeric"
The data in each row looks like this:
1978-01-01, 12.5, 6.3, c(0,0,0.25,0.45,0.3), c(0,0,0,0.1,0.9), 72
I would like to transform it into a matrix or a data frame with one value per column, so the result should look like this:
1978-01-01, 12.5, 6.3, 0, 0, 0.25, 0.45, 0.3, 0, 0, 0, 0.1, 0.9, 72
I have tried using:
j<-unlist(input)
output<-matrix(j,nrow=nrow(input),ncol=length(j)/nrow(input))
But it messes up the order of the rows in the output.
Any idea?
Additional information:
The above example is slightly simplified and dput(head(input))
returns the following sample:
structure(list(DATE = structure(c(2924, 2925, 2926, 2927, 2928,
2929), class = "Date"), TEMP_MEAN_M0 = c(-7.625, -7.375, -6,
-5.5, -7.625, -9.625), SLP_MEAN_M0 = c(1012.125, 991.975, 989.825,
986.675, 988.95, 993.075), WIND_DIR_RF_M0 = structure(list(`2.counts` = c(0,
0.625, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0, 0, 0, 0.125), `3.counts` = c(0.75,
0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.375,
0.125, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.125, 0, 0, 0), `5.counts` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125,
0, 0, 0.125, 0.375, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `6.counts` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125,
0, 0.25, 0.125, 0.25, 0.25, 0, 0, 0, 0, 0, 0, 0, 0, 0), `7.counts` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0.125, 0.5, 0.375, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("2.counts",
"3.counts", "4.counts", "5.counts", "6.counts", "7.counts")),
CEIL_HGT_RF_M0 = structure(list(`2.counts` = c(0.625, 0,
0, 0, 0, 0, 0, 0, 0, 0.375), `3.counts` = c(0.75, 0.125,
0, 0.125, 0, 0, 0, 0, 0, 0), `4.counts` = c(0.25, 0.125,
0, 0.125, 0, 0, 0, 0, 0.25, 0.25), `5.counts` = c(0, 0, 0,
0, 0, 0, 0, 0, 0.125, 0.875), `6.counts` = c(0, 0, 0, 0,
0, 0, 0, 0, 0, 1), `7.counts` = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 1)), .Names = c("2.counts", "3.counts", "4.counts", "5.counts",
"6.counts", "7.counts")), WIND_SPD_MEAN_M0 = c(12.8125, 18.7375,
6.175, 8.175, 10.5375, 16.5375)), .Names = c("DATE", "TEMP_MEAN_M0",
"SLP_MEAN_M0", "WIND_DIR_RF_M0", "CEIL_HGT_RF_M0", "WIND_SPD_MEAN_M0"
), row.names = c(NA, 6L), class = "data.frame")
This is somewhat messy and probably pretty inefficient, but should help get you started:
Here's some sample data:
mydf <- data.frame(Date = as.Date(c("1978-01-01", "1978-01-02")),
V1 = c(10, 10),
V2 = c(11, 11))
mydf$V3 <- list(c(1:10),
c(11:20))
mydf$V4 <- list(c(21:25),
c(26:30))
mydf
# Date V1 V2 V3 V4
# 1 1978-01-01 10 11 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 21, 22, 23, 24, 25
# 2 1978-01-02 10 11 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 26, 27, 28, 29, 30
And, a little function that checks to see which columns are lists, and for those columns, rbind
s them together, and ultimately cbind
s them with the columns that are not lists.
myFun <- function(data) {
temp1 <- sapply(data, is.list)
temp2 <- do.call(
cbind, lapply(data[temp1], function(x)
data.frame(do.call(rbind, x), check.names=FALSE)))
cbind(data[!temp1], temp2)
}
myFun(mydf)
# Date V1 V2 V3.1 V3.2 V3.3 V3.4 V3.5 V3.6 V3.7 V3.8 V3.9 V3.10 V4.1
# 1 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21
# 2 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26
# V4.2 V4.3 V4.4 V4.5
# 1 22 23 24 25
# 2 27 28 29 30
This will only work if each "column" list contain vectors of the same length (otherwise base R's rbind
will not work).
Revisiting this question half a day later, I see that another user (@user1981275) posted a solution that is more straightforward, but then deleted their answer. Perhaps they deleted because their method converted the dates to integers since, as DWin pointed out, items in matrices must be all the same mode.
Here was their solution:
t(apply(mydf, 1, unlist))
# Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# [1,] 2922 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
# [2,] 2923 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30
Here's how it can easily be modified to get the desired output. This will definitely be faster than the earlier approach:
cbind(mydf[!sapply(mydf, is.list)],
(t(apply(mydf[sapply(mydf, is.list)], 1, unlist))))
# Date V1 V2 V31 V32 V33 V34 V35 V36 V37 V38 V39 V310 V41 V42 V43 V44 V45
# 1 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
# 2 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30
Or, as a user function:
myFun <- function(data) {
ListCols <- sapply(data, is.list)
cbind(data[!ListCols], t(apply(data[ListCols], 1, unlist)))
}
myFun(mydf)
I've also written a more efficient function called col_flatten
that's part of my "SOfun" package.
Install the package using:
source("http://news.mrdwab.com/install_github.R")
install_github("mrdwab/SOfun")
Then, you can do:
library(SOfun)
col_flatten(mydf, names(which(sapply(mydf, is.list))), drop = TRUE)
## Date V1 V2 V3_1 V3_2 V3_3 V3_4 V3_5 V3_6 V3_7 V3_8 V3_9 V3_10 V4_1 V4_2 V4_3 V4_4 V4_5
## 1: 1978-01-01 10 11 1 2 3 4 5 6 7 8 9 10 21 22 23 24 25
## 2: 1978-01-02 10 11 11 12 13 14 15 16 17 18 19 20 26 27 28 29 30
It's based on the transpose
function in "data.table", so be sure you have "data.table" as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With