Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Mutate Column That Represents the Mean of Many Other Columns

Tags:

r

dplyr

tidyverse

I am trying to add a column in my DataFrame that represents the mean of many other columns (items that represent a single construct).

The dataframe has many other columns but particularly columns eng1, eng2, eng3...engN where N is a large number and I want to take the mean of all the eng* columns and add that mean as a new column to my dataset.

I was able to do this with the following code:

narrow_ds # ... initialization of dataframe
library(dplyr)
narrow_ds <- bind_cols(narrow_ds, (narrow_ds %>% 
select(starts_with("eng")) %>% mutate(eng=rowMeans(., na.rm=TRUE))) %>% 
select(eng))

It seems that having the na.rm=TRUE requirement forced me to jump through some hoops.

My question is whether there are more straightforward ways to do this?

like image 519
Kevin T Avatar asked Dec 08 '22 14:12

Kevin T


2 Answers

You are very much in right direction. You can avoid bind_cols with a tweak in your code. Moreover, the NA are supported even in rowMeans. I have modified the sample data used by @Tung to include few NAs as well. The solutions can be as:

Option#1: Using dplyr in similar approach as OP.

library(dplyr)
DF %>% mutate(eng = rowMeans(select(.,starts_with("eng")), na.rm = TRUE))

# # A tibble: 4 x 5
#      id  eng1  eng2  eng3   eng
#   <int> <dbl> <dbl> <dbl> <dbl>
# 1     1  50.0    NA  20.0  35.0
# 2     2  NA     100  10.0  55.0
# 3     3  20.0   150  80.0  83.3
# 4     4  30.0   200  40.0  90.0

Option#2: Using apply

DF$eng <- apply(DF[,grep("eng",names(DF))], 1, mean, na.rm = TRUE)

DF
# # A tibble: 4 x 5
#      id  eng1  eng2  eng3   eng
#    <int> <dbl> <dbl> <dbl> <dbl>
# 1     1  50.0    NA  20.0  35.0
# 2     2  NA     100  10.0  55.0
# 3     3  20.0   150  80.0  83.3
# 4     4  30.0   200  40.0  90.0

Sample data:

DF = data_frame(id = 1:4,
                eng1 = c(50, NA, 20, 30), 
                eng2 = c(NA, 100, 150, 200), 
                eng3 = c(20, 10, 80, 40))
like image 89
MKR Avatar answered Dec 10 '22 04:12

MKR


Stealing sample data from @MKR, in base R:

DF$eng <- rowMeans(DF[startsWith(names(DF),"eng")], na.rm = TRUE)

# # A tibble: 4 x 5
#      id  eng1  eng2  eng3      eng
#   <int> <dbl> <dbl> <dbl>    <dbl>
# 1     1    50    NA    20 35.00000
# 2     2    NA   100    10 55.00000
# 3     3    20   150    80 83.33333
# 4     4    30   200    40 90.00000
like image 33
Moody_Mudskipper Avatar answered Dec 10 '22 03:12

Moody_Mudskipper