I have a list of named elements (testlist
) where some of the names are duplicated
$x
[1] "one"
$x
[1] "two"
$y
[1] "three"
$y
[1] "four"
And I am trying to end up with a data table that will combine the elements with common names into the same column.
x y
1: one three
2: two four
I have tried
testdf <- do.call(cbind, lapply(testlist, data.table))
but only end up with:
x.V1 x.V1 y.V1 y.V1
1: one two three four
Any suggestions? Appreciate the help!
Try
library(data.table)#v1.9.5+
dcast(setDT(stack(testlist))[, N:= 1:.N, ind],
N~ind, value.var='values')[,N:=NULL][]
# x y
#1: one three
#2: two four
Or a base R
approach would be
unstack(stack(testlist),values~ind)
# x y
#1 one three
#2 two four
A more efficient base R alternative might be:
data.frame(split(unlist(L, use.names = FALSE), names(L)))
# x y
# 1 one three
# 2 two four
Sample data:
L <- as.list(setNames(c("one", "two", "three", "four"), c("x", "x", "y", "y")))
Also, in "data.table", it would be more efficient to create your data.table
manually rather than using stack
:
library(data.table) # V1.9.4
dcast.data.table(
data.table(val = unlist(L, use.names = FALSE), var = names(L))[
, rn := seq(.N), by = var], rn ~ var, value.var = "val")[, rn := NULL][]
# Required packages
library(stringi)
library(microbenchmark)
library(data.table)
# Sample data
set.seed(1) # for reproducible data
nr = 10000 # final number of rows expected
nc = 100 # final number of columns expected
L <- as.list(setNames(sample(100, nc*nr, TRUE), rep(stri_rand_strings(nc, 7), nr)))
# Functions to benchmark
funak_b <- function() unstack(stack(L),values~ind)
funak_dt <- function() {
dcast.data.table(setDT(stack(L))[, N:= 1:.N, ind],
N ~ ind, value.var = 'values')[, N := NULL][]
}
funam_b <- function() data.frame(split(unlist(L, use.names = FALSE), names(L)))
funam_dt <- function() {
dcast.data.table(
data.table(val = unlist(L, use.names = FALSE), var = names(L))[
, rn := seq(.N), by = var], rn ~ var, value.var = "val")[, rn := NULL][]
}
# Results
microbenchmark(funak_b(), funak_dt(), funam_b(), funam_dt(), times = 20)
# Unit: milliseconds
# expr min lq mean median uq max neval
# funak_b() 2171.53485 2292.55003 2434.8899 2463.1977 2546.4671 2687.5924 20
# funak_dt() 2364.68148 2598.00309 2646.6790 2643.5328 2694.8609 2902.6150 20
# funam_b() 91.88414 93.09794 104.0179 96.4256 100.4168 204.0342 20
# funam_dt() 238.17656 249.59135 344.9249 310.8694 423.6861 508.1844 20
I guess I'd stick with base R on this one :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With