Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a list of named elements into a data frame or data table

Tags:

dataframe

r

I have a list of named elements (testlist) where some of the names are duplicated

$x
[1] "one"

$x
[1] "two"

$y
[1] "three"

$y
[1] "four"

And I am trying to end up with a data table that will combine the elements with common names into the same column.

     x     y
1: one three
2: two  four

I have tried

testdf <- do.call(cbind, lapply(testlist, data.table))

but only end up with:

   x.V1 x.V1  y.V1 y.V1
1:  one  two three four

Any suggestions? Appreciate the help!

like image 665
AlexT Avatar asked Jul 07 '15 15:07

AlexT


2 Answers

Try

library(data.table)#v1.9.5+
dcast(setDT(stack(testlist))[, N:= 1:.N, ind],
                  N~ind, value.var='values')[,N:=NULL][]
#    x     y
#1: one three
#2: two  four

Or a base R approach would be

unstack(stack(testlist),values~ind)
#   x     y
#1 one three
#2 two  four
like image 98
akrun Avatar answered Oct 14 '22 15:10

akrun


A more efficient base R alternative might be:

data.frame(split(unlist(L, use.names = FALSE), names(L)))
#     x     y
# 1 one three
# 2 two  four

Sample data:

L <- as.list(setNames(c("one", "two", "three", "four"), c("x", "x", "y", "y")))

Also, in "data.table", it would be more efficient to create your data.table manually rather than using stack:

library(data.table) # V1.9.4
dcast.data.table(
  data.table(val = unlist(L, use.names = FALSE), var = names(L))[
    , rn := seq(.N), by = var], rn ~ var, value.var = "val")[, rn := NULL][]

# Required packages
library(stringi)
library(microbenchmark)
library(data.table)

# Sample data
set.seed(1)   # for reproducible data
nr = 10000    # final number of rows expected
nc = 100      # final number of columns expected
L <- as.list(setNames(sample(100, nc*nr, TRUE), rep(stri_rand_strings(nc, 7), nr)))

# Functions to benchmark
funak_b <- function() unstack(stack(L),values~ind)
funak_dt <- function() {
  dcast.data.table(setDT(stack(L))[, N:= 1:.N, ind],
                   N ~ ind, value.var = 'values')[, N := NULL][]
}
funam_b <- function() data.frame(split(unlist(L, use.names = FALSE), names(L)))
funam_dt <- function() {
  dcast.data.table(
    data.table(val = unlist(L, use.names = FALSE), var = names(L))[
      , rn := seq(.N), by = var], rn ~ var, value.var = "val")[, rn := NULL][]
}

# Results
microbenchmark(funak_b(), funak_dt(), funam_b(), funam_dt(), times = 20)
# Unit: milliseconds
#        expr        min         lq      mean    median        uq       max neval
#   funak_b() 2171.53485 2292.55003 2434.8899 2463.1977 2546.4671 2687.5924    20
#  funak_dt() 2364.68148 2598.00309 2646.6790 2643.5328 2694.8609 2902.6150    20
#   funam_b()   91.88414   93.09794  104.0179   96.4256  100.4168  204.0342    20
#  funam_dt()  238.17656  249.59135  344.9249  310.8694  423.6861  508.1844    20

I guess I'd stick with base R on this one :-)

like image 24
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 14 '22 15:10

A5C1D2H2I1M1N2O1R2T1