Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to transform dataframe to list of lists in R

Tags:

r

I frequently need to transform a dataframe to a list of lists. One of the keys is that I need to preserve the types (ie: a number stays a number, string stays a string). This is the function that I use currently:

dataframe_to_lists <- function(df){
  return_list <- lapply(split(df, 1:nrow(df)), function(row) as.list(row))
  return(return_list)
}

This is accurate but it is not fast when the number of rows grows large (>10K). What is the fastest way to do this in R?

Here is an example:

> example_df <- data.frame(col1 = c('a', 'b', 'c'), col2 = c(1, 2, 3), col3 = c(4, 5, 6), stringsAsFactors = FALSE)
> list_result <- dataframe_to_lists(example_df)
> list_result
$`1`
$`1`$col1
[1] "a"
$`1`$col2
[1] 1
$`1`$col3
[1] 4

$`2`
$`2`$col1
[1] "b"
$`2`$col2
[1] 2
$`2`$col3
[1] 5

$`3`
$`3`$col1
[1] "c"
$`3`$col2
[1] 3
$`3`$col3
[1] 6
like image 929
Matthew Crews Avatar asked Mar 13 '23 02:03

Matthew Crews


1 Answers

Try:

lis <- rapply(df,as.list,how="list")
lis2 <- lapply(1:length(lis[[1]]), function(i) lapply(lis, "[[", i))

@A.Webb gave an easier and quicker solution:

do.call(function(...) Map(list,...),df)

Example:

set.seed(1)
df <- data.frame(col1 = letters[1:10], col2 = 1:10, col3 = rnorm(1:10))

df
   col1 col2       col3
1     a    1 -0.6264538
2     b    2  0.1836433
3     c    3 -0.8356286
4     d    4  1.5952808
5     e    5  0.3295078
6     f    6 -0.8204684
7     g    7  0.4874291
8     h    8  0.7383247
9     i    9  0.5757814
10    j   10 -0.3053884

lis <- rapply(df,as.list,how="list")
lis2 <- lapply(1:length(lis[[1]]), function(i) lapply(lis, "[[", i))

head(lis2, 2)

[[1]]
[[1]]$col1
[1] a
Levels: a b c d e f g h i j

[[1]]$col2
[1] 1

[[1]]$col3
[1] -0.6264538


[[2]]
[[2]]$col1
[1] b
Levels: a b c d e f g h i j

[[2]]$col2
[1] 2

[[2]]$col3
[1] 0.1836433

Benchmark:

set.seed(123)
N <- 100000
df <- data.frame(col1 = rep("A", N), col2 = 1:N, col3 = rnorm(N)) 

system.time({
    lis <- rapply(df,as.list,how="list")
    lis2 <- lapply(1:length(lis[[1]]), function(i) lapply(lis, "[[", i))
})

user  system elapsed 
1.36    0.00    1.36

system.time(do.call(function(...) Map(list,...),df))

user  system elapsed 
0.69    0.00    0.69
like image 150
slamballais Avatar answered Mar 31 '23 15:03

slamballais