Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elegant way to add a bunch of similar dataframes cell wise in R?

Tags:

r

dplyr

I have a list of 1000 data frames of similar type

dfs<-list()
for (i in 1:1000) {
  dfs[[i]]<-iris[sample(1:length(iris$Sepal.Length),80),-5]
}

Each of these is an 80 by 4 data frame. I want to add (or any other operation) all these data frames in a cellwise manner and get the output as an 80 by 4 dataframe with each cell containing the sum of 1000 cells or maybe the mean of the 1000 cells?

like image 773
Steve austin Avatar asked Dec 13 '22 15:12

Steve austin


2 Answers

You can use Reduce:

Reduce(`+`, dfs)
#     Sepal.Length Sepal.Width Petal.Length Petal.Width
# 122         28.0        13.2         18.7         6.1
# 87          26.8        14.9         15.1         4.5
# 100         30.8        14.6         23.1         7.7

On this case it's simple because + adds the lhs and rhs element wise, with vectorized functions (like paste) you could use:

data.frame(Reduce(function(x,y) Map(paste,x,y), dfs))
#          Sepal.Length         Sepal.Width        Petal.Length       Petal.Width
# 1     5.6 6.2 5 5.2 6 2.8 2.2 3.3 2.7 2.2   4.9 4.5 1.4 3.9 4   2 1.5 0.2 1.4 1
# 2   6.7 4.6 4.6 5.9 5   3.1 3.4 3.1 3 2.3 4.7 1.4 1.5 4.2 3.3 1.5 0.3 0.2 1.5 1
# 3 5.7 5.8 6.7 6.1 6.5   2.8 2.7 3.3 3 2.8 4.1 4.1 5.7 4.6 4.6 1.3 1 2.5 1.4 1.5

data

dfs<-list()
for (i in 1:5) {
  dfs[[i]]<-iris[sample(1:nrow(iris),3),-5]
}
like image 53
Moody_Mudskipper Avatar answered Dec 18 '22 00:12

Moody_Mudskipper


You can use apply after changing the dataframes into the dimensions you want: eg:

i=nrow(dfs[[1]])
j=ncol(dfs[[1]])
k=length(dfs)
apply(array(unlist(dfs),c(i,j,k)),c(1,2),sum)

     [,1] [,2] [,3] [,4]
[1,] 29.3 15.7 17.6  5.3
[2,] 29.1 16.3 18.3  6.4
[3,] 27.9 15.1 15.6  4.4

if you want the mean:

apply(array(unlist(dfs),c(i,j,k)),c(1,2),mean)

     [,1] [,2] [,3] [,4]
[1,] 5.86 3.14 3.52 1.06
[2,] 5.82 3.26 3.66 1.28
[3,] 5.58 3.02 3.12 0.88

if you want the max:

apply(array(unlist(dfs),c(i,j,k)),c(1,2),max)
     [,1] [,2] [,3] [,4]
[1,]  7.2  3.6  6.1  2.5
[2,]  6.9  3.8  5.7  2.3
[3,]  6.1  3.5  4.9  1.8

You can do any function you want that returns a summarized value

data.frame(apply(array(unlist(dfs),c(i,j,k)),c(1,2),paste0,collapse=","))
                   X1                  X2                X3                  X4
1   4.8,7.2,6,6.4,4.9 3.1,3.6,2.2,3.2,3.6 1.6,6.1,4,4.5,1.4   0.2,2.5,1,1.5,0.1
2 4.6,6.9,5.8,5.1,6.7   3.6,3.2,2.7,3.8,3   1,5.7,5.1,1.5,5 0.2,2.3,1.9,0.3,1.7
3 4.8,5.8,5.5,6.1,5.7   3.4,2.6,3.5,3,2.6 1.9,4,1.3,4.9,3.5   0.2,1.2,0.2,1.8,1
like image 33
KU99 Avatar answered Dec 17 '22 22:12

KU99