I'm trying to fill missing values in my dataframe, but I do not want all possible combinations of variables - I only want to fill based on a grouping of three variables: coursecode
, year
, and week
.
I've looked into complete()
in tidyr library but I can't get it to work, even after looking at Using tidyr::complete with group_by and https://blog.rstudio.org/2015/09/13/tidyr-0-3-0/
I have observers that collect data on given weeks of the year at different courses. For example, data might be collected in my larger dataset for weeks 1-10, but I only care about the missing weeks that occurred in a particular course-year combination. e.g.,
Example:
library(dplyr)
library(tidyr)
df <- data.frame(coursecode = rep(c("A", "B"), each = 6),
year = rep(c(2000, 2000, 2000, 2001, 2001, 2001), 2),
week = c(1, 3, 4, 1, 2, 3, 2, 3, 5, 3, 4, 5),
values = c(1:12),
othervalues = c(12:23),
region = "Big")
df
coursecode year week values othervalues region
1 A 2000 1 1 12 Big
2 A 2000 3 2 13 Big
3 A 2000 4 3 14 Big
4 A 2001 1 4 15 Big
5 A 2001 2 5 16 Big
6 A 2001 3 6 17 Big
7 B 2000 2 7 18 Big
8 B 2000 3 8 19 Big
9 B 2000 5 9 20 Big
10 B 2001 3 10 21 Big
11 B 2001 4 11 22 Big
12 B 2001 5 12 23 Big
try with complete: (not my desired output)
df %>%
complete(coursecode, year, region, nesting(week))
# A tibble: 20 x 6
coursecode year region week values othervalues
<fctr> <dbl> <fctr> <dbl> <int> <int>
1 A 2000 Big 1 1 12
2 A 2000 Big 2 NA NA
3 A 2000 Big 3 2 13
4 A 2000 Big 4 3 14
5 A 2000 Big 5 NA NA
6 A 2001 Big 1 4 15
7 A 2001 Big 2 5 16
8 A 2001 Big 3 6 17
9 A 2001 Big 4 NA NA
10 A 2001 Big 5 NA NA
11 B 2000 Big 1 NA NA
12 B 2000 Big 2 7 18
13 B 2000 Big 3 8 19
14 B 2000 Big 4 NA NA
15 B 2000 Big 5 9 20
16 B 2001 Big 1 NA NA
17 B 2001 Big 2 NA NA
18 B 2001 Big 3 10 21
19 B 2001 Big 4 11 22
20 B 2001 Big 5 12 23
Desired output
coursecode year region week values othervalues
<fctr> <dbl> <fctr> <dbl> <int> <int>
1 A 2000 Big 1 1 12
2 A 2000 Big 2 NA NA
3 A 2000 Big 3 2 13
4 A 2000 Big 4 3 14
5 A 2001 Big 1 4 15
6 A 2001 Big 2 5 16
7 A 2001 Big 3 6 17
8 B 2000 Big 2 7 18
9 B 2000 Big 3 8 19
10 B 2000 Big 4 NA NA
11 B 2000 Big 5 9 20
12 B 2001 Big 3 10 21
13 B 2001 Big 4 11 22
14 B 2001 Big 5 12 23
We can try with expand
and left_join
library(dplyr)
library(tidyr)
df %>%
group_by(coursecode, year, region) %>%
expand(week = full_seq(week, 1)) %>%
left_join(., df)
# coursecode year region week values othervalues
# <fctr> <dbl> <fctr> <dbl> <int> <int>
#1 A 2000 Big 1 1 12
#2 A 2000 Big 2 NA NA
#3 A 2000 Big 3 2 13
#4 A 2000 Big 4 3 14
#5 A 2001 Big 1 4 15
#6 A 2001 Big 2 5 16
#7 A 2001 Big 3 6 17
#8 B 2000 Big 2 7 18
#9 B 2000 Big 3 8 19
#10 B 2000 Big 4 NA NA
#11 B 2000 Big 5 9 20
#12 B 2001 Big 3 10 21
#13 B 2001 Big 4 11 22
#14 B 2001 Big 5 12 23
As the OP was using complete()
(which is based on expand()
and left_join()
), one could stick to it and save themselves writing an extra line of code compared to @akrun's solution:
# example data
df <- data.frame(coursecode = rep(c("A", "B"), each = 6),
year = rep(c(2000, 2000, 2000, 2001, 2001, 2001), 2),
week = c(1, 3, 4, 1, 2, 3, 2, 3, 5, 3, 4, 5),
values = c(1:12),
othervalues = c(12:23),
region = "Big")
# complete by group
library(dplyr)
library(tidyr)
df %>%
group_by(coursecode, year, region) %>%
complete(week = full_seq(week, 1))
#> # A tibble: 14 x 6
#> # Groups: coursecode, year, region [4]
#> coursecode year region week values othervalues
#> <chr> <dbl> <chr> <dbl> <int> <int>
#> 1 A 2000 Big 1 1 12
#> 2 A 2000 Big 2 NA NA
#> 3 A 2000 Big 3 2 13
#> 4 A 2000 Big 4 3 14
#> 5 A 2001 Big 1 4 15
#> 6 A 2001 Big 2 5 16
#> 7 A 2001 Big 3 6 17
#> 8 B 2000 Big 2 7 18
#> 9 B 2000 Big 3 8 19
#> 10 B 2000 Big 4 NA NA
#> 11 B 2000 Big 5 9 20
#> 12 B 2001 Big 3 10 21
#> 13 B 2001 Big 4 11 22
#> 14 B 2001 Big 5 12 23
Created on 2020-10-29 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With