I have five dataframes of about 60 columns that I need to combine. They each have the same columns and I'm combining them with their means since they represent the same value. The issue isn't the ability to combine them, but doing so efficiently. Here is sample data/code:
#reproducible random data
set.seed(123)
dat1 <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16))
dat2 <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16))
dat3 <- data.frame( a = rnorm(16), b = rnorm(16), c = rnorm(16), d = rnorm(16), e = rnorm(16), f = rnorm(16))
#This works but is inefficient
final_data<-data.frame(a=rowMeans(cbind(dat1$a,dat2$a,dat3$a)),
b=rowMeans(cbind(dat1$b,dat2$b,dat3$b)),
c=rowMeans(cbind(dat1$c,dat2$c,dat3$c)),
d=rowMeans(cbind(dat1$d,dat2$d,dat3$d)),
e=rowMeans(cbind(dat1$e,dat2$e,dat3$e)),
f=rowMeans(cbind(dat1$f,dat2$f,dat3$f))
)
#what results should look like
head(final_data)
# a b c d e f
# 1 0.573813625 0.17695841 -0.1434628 -0.53673101 0.353906578 0.24262067
# 2 0.135689926 -0.69206908 0.2888584 -0.37215810 -0.038298083 -0.23317107
# 3 0.004068807 0.44666945 0.5205118 0.09587453 -0.308528454 0.30516883
# 4 0.347100292 0.02401646 0.1409754 -0.15931120 0.587047386 -0.08684867
# 5 0.006529998 0.09010946 0.4932670 0.62606230 -0.005235813 -0.36967000
# 6 0.240225778 -0.45824825 -0.5000004 0.66131121 0.619480608 0.55650611
The issue here is that I don't want to rewrite a=rowMeans(cbind(dat1$a,dat2$a,dat3$a))
for each of 60 columns in the new data frame. Can you think of a good way to go about this?
EDIT: I'm going to accept the following answer since it allows me to set the columns to apply it over-
final_data1<-as.data.frame(sapply(colnames(dat1),function(i)
rowMeans(cbind(dat1[,i],dat2[,i],dat3[,i]))))
> identical(final_data1,final_data)
[1] TRUE
How about this?
(dat1+dat2+dat3)/3
Or, to first select/reorder a subset of the columns, and then add the resulting data.frames, you could do this:
jj <- letters[1:6]
Reduce(`+`, lapply(list(dat1,dat2,dat3), `[`, jj))/3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With