Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the average of two columns using dplyr?

Tags:

r

dplyr

mean

how to get the average of two columns of a data table using dplyr? For example, if my data if like below:

dt <- data.table(A=1:5, B=c(1,4,NA,6,8))

I want to create a new column "Avg" which is the mean of column A and B for each row:

dt %>% mutate(Avg=mean(c(A, B), na.rm=T))

But this code does not give me the correct result. How to do this? Thank you very much.

like image 240
Carter Avatar asked Dec 09 '15 01:12

Carter


People also ask

How do I get the average of multiple columns in R?

To find the mean of multiple columns based on multiple grouping columns in R data frame, we can use summarise_at function with mean function.

How do I get the average of a column in R?

To calculate the average of a data frame column in R, use the mean() function. The mean() function takes the column name as an argument and calculates the mean value of that column.

How do you find the mean of a column in R using dplyr?

Computing Column Means on data without missing data using across() function dplyr. Our dataframe contains both numerical and character variables. To compute means of all numerical columns, we use select() function to select the numerical columns. And then apply across() function on all columns to compute mean values.

How do I combine two columns in dplyr?

How do I concatenate two columns in R? To concatenate two columns you can use the <code>paste()</code> function. For example, if you want to combine the two columns A and B in the dataframe df you can use the following code: <code>df['AB'] <- paste(df$A, df$B)</code>.


3 Answers

If you want to use dplyr to achieve this, I would suggest using the function rowwise():

    R> library(dplyr)
    R> dt <- data.table(A=1:5, B=c(1,4,NA,6,8))
    R> j <- dt %>% rowwise() %>% mutate(Avg=mean(c(A, B), na.rm=T)) 
    R> j
Source: local data frame [5 x 3]
Groups: <by row>

      A     B   Avg
  (int) (dbl) (dbl)
1     1     1   1.0
2     2     4   3.0
3     3    NA   3.0
4     4     6   5.0
5     5     8   6.5
like image 135
Stedy Avatar answered Oct 21 '22 11:10

Stedy


How about

dt %>% mutate(Avg=rowMeans(cbind(A, B), na.rm=T))

mean is not vectorized. It collapse all inputs to a single value. If you make a matrix with cbind(), you can use rowMeans to do the trick.

like image 26
MrFlick Avatar answered Oct 21 '22 11:10

MrFlick


As the initial dataset is data.table, we could use data.table methods

dt[, Avg:= mean(unlist(.SD), na.rm=TRUE) , .1:nrow(dt)]
dt
#   A  B Avg
#1: 1  1 1.0
#2: 2  4 3.0
#3: 3 NA 3.0
#4: 4  6 5.0
#5: 5  8 6.5
like image 1
akrun Avatar answered Oct 21 '22 10:10

akrun