Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New dataframe of means across dataframes

Tags:

r

I have five dataframes of about 60 columns that I need to combine. They each have the same columns and I'm combining them with their means since they represent the same value. The issue isn't the ability to combine them, but doing so efficiently. Here is sample data/code:

#reproducible random data
set.seed(123)

dat1 <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16))
dat2 <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16))
dat3 <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16))

#This works but is inefficient

final_data<-data.frame(a=rowMeans(cbind(dat1$a,dat2$a,dat3$a)),
                       b=rowMeans(cbind(dat1$b,dat2$b,dat3$b)),
                       c=rowMeans(cbind(dat1$c,dat2$c,dat3$c)),
                       d=rowMeans(cbind(dat1$d,dat2$d,dat3$d)),
                       e=rowMeans(cbind(dat1$e,dat2$e,dat3$e)),
                       f=rowMeans(cbind(dat1$f,dat2$f,dat3$f))
)
#what results should look like
head(final_data)
#             a           b          c           d            e           f
# 1 0.573813625  0.17695841 -0.1434628 -0.53673101  0.353906578  0.24262067
# 2 0.135689926 -0.69206908  0.2888584 -0.37215810 -0.038298083 -0.23317107
# 3 0.004068807  0.44666945  0.5205118  0.09587453 -0.308528454  0.30516883
# 4 0.347100292  0.02401646  0.1409754 -0.15931120  0.587047386 -0.08684867
# 5 0.006529998  0.09010946  0.4932670  0.62606230 -0.005235813 -0.36967000
# 6 0.240225778 -0.45824825 -0.5000004  0.66131121  0.619480608  0.55650611

The issue here is that I don't want to rewrite a=rowMeans(cbind(dat1$a,dat2$a,dat3$a)) for each of 60 columns in the new data frame. Can you think of a good way to go about this?

EDIT: I'm going to accept the following answer since it allows me to set the columns to apply it over-

final_data1<-as.data.frame(sapply(colnames(dat1),function(i)
    rowMeans(cbind(dat1[,i],dat2[,i],dat3[,i]))))

> identical(final_data1,final_data)
[1] TRUE
like image 978
Jason Avatar asked Dec 25 '22 20:12

Jason


1 Answers

How about this?

(dat1+dat2+dat3)/3

Or, to first select/reorder a subset of the columns, and then add the resulting data.frames, you could do this:

jj <- letters[1:6]
Reduce(`+`, lapply(list(dat1,dat2,dat3), `[`, jj))/3
like image 89
Josh O'Brien Avatar answered Jan 07 '23 11:01

Josh O'Brien