Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to cast a list as a data frame?

Tags:

list

dataframe

r

Very often I want to convert a list wherein each index has identical element types to a data frame. For example, I may have a list:

> my.list [[1]] [[1]]$global_stdev_ppb [1] 24267673  [[1]]$range [1] 0.03114799  [[1]]$tok [1] "hello"  [[1]]$global_freq_ppb [1] 211592.6   [[2]] [[2]]$global_stdev_ppb [1] 11561448  [[2]]$range [1] 0.08870838  [[2]]$tok [1] "world"  [[2]]$global_freq_ppb [1] 1002043 

I want to convert this list to a data frame where each index element is a column. The natural (to me) thing to go is to is use do.call:

> my.matrix<-do.call("rbind", my.list) > my.matrix      global_stdev_ppb range      tok     global_freq_ppb [1,] 24267673         0.03114799 "hello" 211592.6        [2,] 11561448         0.08870838 "world" 1002043 

Straightforward enough, but when I attempt to cast this matrix as a data frame, the columns remain list elements, rather than vectors:

> my.df<-as.data.frame(my.matrix, stringsAsFactors=FALSE) > my.df[,1] [[1]] [1] 24267673  [[2]] [1] 11561448 

Currently, to get the data frame cast properly I am iterating over each column using unlist and as.vector, then recasting the data frame as such:

new.list<-lapply(1:ncol(my.matrix), function(x) as.vector(unlist(my.matrix[,x]))) my.df<-as.data.frame(do.call(cbind, new.list), stringsAsFactors=FALSE) 

This, however, seem very inefficient. Is there are better way to do this?

like image 407
DrewConway Avatar asked Dec 22 '10 18:12

DrewConway


People also ask

Can a data frame contain a list?

Data frame columns can contain lists Taking into account the list structure of the column, we can type the following to change the values in a single cell. You can also create a data frame having a list as a column using the data.


1 Answers

I think you want:

> do.call(rbind, lapply(my.list, data.frame, stringsAsFactors=FALSE))   global_stdev_ppb      range   tok global_freq_ppb 1         24267673 0.03114799 hello        211592.6 2         11561448 0.08870838 world       1002043.0 > str(do.call(rbind, lapply(my.list, data.frame, stringsAsFactors=FALSE))) 'data.frame':   2 obs. of  4 variables:  $ global_stdev_ppb: num  24267673 11561448  $ range           : num  0.0311 0.0887  $ tok             : chr  "hello" "world"  $ global_freq_ppb : num  211593 1002043 
like image 146
Joshua Ulrich Avatar answered Oct 03 '22 17:10

Joshua Ulrich