Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does complete() create duplicate rows in my data?

Tags:

r

tidyr

When I use the complete() function to fill in rows in my data that have no cases I find it is creating many duplicate rows as well. These can be removed with the unique() function, but I want to understand how I can avoid generating all these extra rows in the first place.

library(dplyr)
library(tidyr)

# An incomplete table
mtcars %>% 
  group_by(vs, cyl) %>% 
  count()

# complete() creates a table with many duplicate rows
temp <- 
  mtcars %>% 
  group_by(vs, cyl) %>% 
  count() %>% 
  complete(vs = c(0, 1), cyl = c(4, 6, 8), fill = list(n = 0)) 

unique(temp)
like image 993
Joe Avatar asked Feb 01 '18 23:02

Joe


1 Answers

This is answered in a comment by @aosmith.

The duplicates come from the grouped data. Ungrouping using ungroup solves the issue:

temp <- 
  mtcars %>% 
  group_by(vs, cyl) %>% 
  count() %>% 
  ungroup() %>%
  complete(vs = c(0, 1), cyl = c(4, 6, 8), fill = list(n = 0)) 
like image 176
alko989 Avatar answered Oct 31 '22 15:10

alko989