Its a bit tricky to explain, Ill try my best, query below. I have a df as below. I need to filter rows by group based on maximum pop in country column but which has not already occurred in the above groups. (As per output (image), the reason why A didnt feature in group2 because it had already occured in Group 1)
In short, I need to get unique values in country column at the same time get maximum value in pop (on a group level). I hope picture can convey what I could not. (Tidyverse solution preferred)
df<- structure(list(Group = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), country = c("A", "B", "C", "A", "E", "F", "A", "E", "G"), pop = c(200L, 100L, 50L, 200L, 150L, 120L, 200L, 150L,
100L)), class = "data.frame", row.names = c(NA, -9L))
I think this will do. Explanation of syntax
.init
in next step but after filtering for the max of pop
value.purrr::reduce
here which will reduce the list of tibbles to a single tibblereduce
.init
used as filtered first groupanti_join
pop
againbind_rows()
df %>% group_split(Group) %>% .[-1] %>%
reduce(.init =df %>% group_split(Group) %>% .[[1]] %>%
filter(pop == max(pop)),
~ .y %>%
anti_join(.x, by = c("country" = "country")) %>%
filter(pop == max(pop)) %>%
bind_rows(.x) %>% arrange(Group))
# A tibble: 3 x 3
Group country pop
<int> <chr> <int>
1 1 A 200
2 2 E 150
3 3 G 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With