I have a data frame in which I want to filter out whole groups if the top row of that group does not contain a particular condition in one column.
An example using the following dataset:
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'D', 'D', 'D', 'E', 'E'), gameplayed=c('Yes', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes'))
I want to group these by 'team' first. Then, I want to remove the entire group if the first row contains a 'No' in the 'gameplayed' column.
This would be the desired output:
df2 <- data.frame(team=c('A', 'A', 'A', 'A', 'C', 'D', 'D', 'D'), gameplayed=c('Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes'))
I've played around with various options, such as the following, but can't get it to work for me:
> df %>% group_by(team) %>% + filter("Yes" == first(gameplayed))
You can use ave
like.
df[ave(df$gameplayed != "No", df$team, FUN=\(x) x[1]),]
#df[ave(df$gameplayed, df$team, FUN=\(x) x[1]) != "No",] #Alternative
# team gameplayed
#1 A Yes
#2 A No
#3 A Yes
#4 A Yes
#8 C Yes
#9 D Yes
#10 D No
#11 D Yes
Benchmark
library(tidyverse)
bench::mark(check=FALSE,
GKi = df[ave(df$gameplayed != "No", df$team, FUN=\(x) x[1]),],
Limney = df %>% group_by(team) %>% filter(first(gameplayed) == "Yes") %>% ungroup(),
Elias = df %>% group_by(team) %>% mutate(id = row_number()) %>% filter(!any(gameplayed == "No" && id == 1)) #With Warings
)
# expression min median itr/s…¹ mem_a…² gc/se…³ n_itr n_gc total…⁴ result
# <bch:expr> <bch:tm> <bch:t> <dbl> <bch:b> <dbl> <int> <dbl> <bch:t> <list>
#1 GKi 75.74µs 82.05µs 11936. 4.52KB 16.8 5685 8 476ms <NULL>
#2 Limney 4.52ms 4.6ms 214. 7.92KB 15.6 96 7 448ms <NULL>
#3 Elias 3.36ms 3.42ms 291. 9.53KB 13.0 134 6 461ms <NULL>
GKi is in this case about 40 times faster than Elias and 50 times faster than Limney and uses less memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With