dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output

Tags:

When using summarise with plyr's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE. However, this doesn't work when using summarise with dplyr. Is there another way to keep empty categories in the result?

Here's an example with fake data.

library(dplyr)  df = data.frame(a=rep(1:3,4), b=rep(1:2,6))  # Now add an extra level to df$b that has no corresponding value in df$a df$b = factor(df$b, levels=1:3)  # Summarise with plyr, keeping categories with a count of zero plyr::ddply(df, "b", summarise, count_a=length(a), .drop=FALSE)    b    count_a 1 1    6 2 2    6 3 3    0  # Now try it with dplyr df %.%   group_by(b) %.%   summarise(count_a=length(a), .drop=FALSE)    b     count_a .drop 1 1     6       FALSE 2 2     6       FALSE

Not exactly what I was hoping for. Is there a dplyr method for achieving the same result as .drop=FALSE in plyr?

619

asked Mar 20 '14 03:03

eipi10

2 Answers

The issue is still open, but in the meantime, especially since your data are already factored, you can use complete from "tidyr" to get what you might be looking for:

library(tidyr) df %>%   group_by(b) %>%   summarise(count_a=length(a)) %>%   complete(b) # Source: local data frame [3 x 2] #  #        b count_a #   (fctr)   (int) # 1      1       6 # 2      2       6 # 3      3      NA

If you wanted the replacement value to be zero, you need to specify that with fill:

df %>%   group_by(b) %>%   summarise(count_a=length(a)) %>%   complete(b, fill = list(count_a = 0)) # Source: local data frame [3 x 2] #  #        b count_a #   (fctr)   (dbl) # 1      1       6 # 2      2       6 # 3      3       0

answered Oct 01 '22 14:10

A5C1D2H2I1M1N2O1R2T1

Since dplyr 0.8 group_by gained the .drop argument that does just what you asked for:

df = data.frame(a=rep(1:3,4), b=rep(1:2,6)) df$b = factor(df$b, levels=1:3)  df %>%   group_by(b, .drop=FALSE) %>%   summarise(count_a=length(a))  #> # A tibble: 3 x 2 #>   b     count_a #>   <fct>   <int> #> 1 1           6 #> 2 2           6 #> 3 3           0

One additional note to go with @Moody_Mudskipper's answer: Using .drop=FALSE can give potentially unexpected results when one or more grouping variables are not coded as factors. See examples below:

library(dplyr) data(iris)  # Add an additional level to Species iris$Species = factor(iris$Species, levels=c(levels(iris$Species), "empty_level"))  # Species is a factor and empty groups are included in the output iris %>% group_by(Species, .drop=FALSE) %>% tally  #>   Species         n #> 1 setosa         50 #> 2 versicolor     50 #> 3 virginica      50 #> 4 empty_level     0  # Add character column iris$group2 = c(rep(c("A","B"), 50), rep(c("B","C"), each=25))  # Empty groups involving combinations of Species and group2 are not included in output iris %>% group_by(Species, group2, .drop=FALSE) %>% tally  #>   Species     group2     n #> 1 setosa      A         25 #> 2 setosa      B         25 #> 3 versicolor  A         25 #> 4 versicolor  B         25 #> 5 virginica   B         25 #> 6 virginica   C         25 #> 7 empty_level <NA>       0  # Turn group2 into a factor iris$group2 = factor(iris$group2)  # Now all possible combinations of Species and group2 are included in the output,  #  whether present in the data or not iris %>% group_by(Species, group2, .drop=FALSE) %>% tally  #>    Species     group2     n #>  1 setosa      A         25 #>  2 setosa      B         25 #>  3 setosa      C          0 #>  4 versicolor  A         25 #>  5 versicolor  B         25 #>  6 versicolor  C          0 #>  7 virginica   A          0 #>  8 virginica   B         25 #>  9 virginica   C         25 #> 10 empty_level A          0 #> 11 empty_level B          0 #> 12 empty_level C          0  Created on 2019-03-13 by the reprex package (v0.2.1)

answered Oct 01 '22 12:10

Moody_Mudskipper

Related questions
                            
                                Sort columns of a dataframe by column name
                            
                                R: Count number of objects in list [closed]
                            
                                switch() statement usage
                            
                                Converting string to numeric [duplicate]
                            
                                R Conditional evaluation when using the pipe operator %>%
                            
                                How can I load an object into a variable name that I specify from an R data file?
                            
                                Getting the top values by group
                            
                                Remove extra legends in ggplot2
                            
                                Subset of rows containing NA (missing) values in a chosen column of a data frame
                            
                                Hosting and setting up own shiny apps without shiny server
                            
                                Define all functions in one .R file, call them from another .R file. How, if possible?
                            
                                Comma separator for numbers in R?
                            
                                List distinct values in a vector in R
                            
                                The cause of "bad magic number" error when loading a workspace and how to avoid it?
                            
                                R programming: How do I get Euler's number?
                            
                                Left align two graph edges (ggplot)
                            
                                Paste multiple columns together
                            
                                How to randomize (or permute) a dataframe rowwise and columnwise?
                            
                                Subscripts in plots in R
                            
                                How to remove outliers from a dataset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output

Tags:

r

dplyr

plyr

tidyr

eipi10

People also ask

2 Answers

A5C1D2H2I1M1N2O1R2T1

Moody_Mudskipper

Recent Activity

Donate For Us