Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mean excluding zero and na for all columns with dplyr

Tags:

r

dplyr

I want do do a mean of my dataframe with the dplyr package for all my colums.

n = c(NA, 3, 5) 
s = c("aa", "bb", "cc") 
b = c(3, 0, 5) 
df = data.frame(n, s, b)

Here I want my function to get mean = 4 the n and b columns I tried mean(df$n[df$n>0]) buts it's not easy for a large dataframe. I want something like df %>% summarise_each(funs(mean)) ... Thanks

like image 891
Mostafa790 Avatar asked Mar 09 '16 17:03

Mostafa790


People also ask

How to replace 0 with NA in a column in R?

Replace 0 with NA in an R DataframeUse df[df==0] to check if the value of a dataframe column is 0, if it is 0 you can assign the value NA .

How do I group multiple variables in R?

Group By Multiple Columns in R using dplyrUse group_by() function in R to group the rows in DataFrame by multiple columns (two or more), to use this function, you have to install dplyr first using install. packages('dplyr') and load it using library(dplyr) . All functions in dplyr package take data.

How do I sum multiple columns in R?

We can calculate the sum of multiple columns by using rowSums() and c() Function. we simply have to pass the name of the columns.

How do you find the mean of a row in R?

The rowMeans() function in R can be used to calculate the mean of several rows of a matrix or data frame in R.


2 Answers

If you don't want 0s it's probably that you consider them as NAs, so let's be explicit about it, then summarize numeric columns with na.rm = TRUE :

library(dplyr)
df[df==0] <- NA
summarize_if(df, is.numeric, mean, na.rm = TRUE)
#   n b
# 1 4 4

As a one liner:

summarize_if(`[<-`(df, df==0, value= NA), is.numeric, mean, na.rm = TRUE)

and in base R (result as a named numeric vector)

sapply(`[<-`(df, df==0, value= NA)[sapply(df, is.numeric)], mean, na.rm=TRUE)
like image 124
Moody_Mudskipper Avatar answered Nov 30 '22 07:11

Moody_Mudskipper


Cf elegant David Answer :

df %>% summarise_each(funs(mean(.[!is.na(.) & . != 0])), -s) 

Or

df %>% summarise_each(funs(mean(.[. != 0], na.rm = TRUE)), -s)
like image 29
Mostafa790 Avatar answered Nov 30 '22 07:11

Mostafa790