library(tidyverse)
I'm stuck on something that should be so simple! Using the code below, all I want to do is group and summarise the three "Var" columns. I want counts and sums (so that I can create three percentage columns, so bonus if you can include an easy way to accomplish this in your answer). However, I don't want to include the NA's. Removing the NA's from sum is easy enough by using "na.rm=TRUE", but I can't seem to figure out how to not include the NA's in the counts (using n() ) while using dplyr::summarise_at.
Am I missing something very simple?
Df%>%group_by(Group)%>%summarise_at(vars(Var1:Var3),funs(n(),sum((.),na.rm=TRUE)))
Group<-c("House","Condo","House","House","House","House","House","Condo")
Var1<-c(0,1,1,NA,1,1,1,0)
Var2<-c(1,1,1,1,0,1,1,1)
Var3<-c(1,1,1,NA,NA,1,1,0)
Df<-data.frame(Group,Var1,Var2,Var3)
I think your code was very close to getting the job done. I made some slight changes and have included an example of how you might include the percent calculation in the same step (although I am not sure of your expected output).
library(dplyr)
Df %>%
group_by(Group) %>%
summarise_all(funs(count = sum(!is.na(.)),
sum = sum(.,na.rm=TRUE),
pct = sum(.,na.rm=TRUE)/sum(!is.na(.))))
#> # A tibble: 2 x 10
#> Group Var1_count Var2_count Var3_count Var1_sum Var2_sum Var3_sum
#> <fctr> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 Condo 2 2 2 1 2 1
#> 2 House 5 6 4 4 5 4
#> # ... with 3 more variables: Var1_pct <dbl>, Var2_pct <dbl>,
#> # Var3_pct <dbl>
I've also used summarise_all
instead of summarise_at
as summarise_all
works on all the variables which aren't group
variables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With