Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dplyr - Mean for multiple columns

Tags:

r

dplyr

I want to calculate the mean for several columns and thus create a new column for the mean using dplyr and without melting + merging.

> head(growth2)   CODE_COUNTRY CODE_PLOT IV12_ha_yr IV23_ha_yr IV34_ha_yr IV14_ha_yr IV24_ha_yr IV13_ha_yr 1            1         6       4.10       6.97         NA         NA         NA       4.58 2            1        17       9.88       8.75         NA         NA         NA       8.25 3            1        30         NA         NA         NA         NA         NA         NA 4            1        37      15.43      15.07      11.89      10.00      12.09      14.33 5            1        41      20.21      15.01      14.72      11.31      13.27      17.09 6            1        46      12.64      14.36      13.65       9.07      12.47      12.36 >  

I need a new column within the dataset with the mean of all the IV columns. I tried this:

growth2 %>%    group_by(CODE_COUNTRY, CODE_PLOT) %>%   summarise(IVmean=mean(IV12_ha_yr:IV13_ha_yr, na.rm=TRUE)) 

And returned several errors depending on the example used, such as:

Error in NA_real_:NA_real_ : NA/NaN argument 

or

Error in if (trim > 0 && n) { : missing value where TRUE/FALSE needed 
like image 941
fede_luppi Avatar asked Feb 26 '15 14:02

fede_luppi


People also ask

How do you find the mean of multiple columns in R?

To find the mean of multiple columns based on multiple grouping columns in R data frame, we can use summarise_at function with mean function.

How do you find the mean of a column in R using Dplyr?

Computing Column Means on data without missing data using across() function dplyr. Our dataframe contains both numerical and character variables. To compute means of all numerical columns, we use select() function to select the numerical columns. And then apply across() function on all columns to compute mean values.

Can you group by multiple columns in Dplyr?

The group_by() method is used to group the data contained in the data frame based on the columns specified as arguments to the function call.

How do you calculate mean of multiple columns in pandas?

To calculate the mean of whole columns in the DataFrame, use pandas. Series. mean() with a list of DataFrame columns. You can also get the mean for all numeric columns using DataFrame.


2 Answers

You don't need to group, just select() and then mutate()

library(dplyr) mutate(df, IVMean = rowMeans(select(df, starts_with("IV")), na.rm = TRUE)) 
like image 140
Rich Scriven Avatar answered Oct 12 '22 14:10

Rich Scriven


Use . in dplyr.

library(dplyr) mutate(df, IVMean = rowMeans(select(., starts_with("IV")), na.rm = TRUE)) 
like image 36
Shixiang Wang Avatar answered Oct 12 '22 12:10

Shixiang Wang