Example data:
mydf<-data.frame(Group_ID=c("337", "337", "201", "201", "470", "470", "999", "999"),
Timestamp=c("A", "A", "B", "B", "C", "D", "E", "F"),
MU=as.numeric(c("1", "1", "2", "3", "4", "4", "5", "6")))
Gives:
Group_ID Timestamp MU
337 A 1
337 A 1
201 B 2
201 B 3
470 C 4
470 D 4
999 E 5
999 F 6
Where MU is greater than 1, I would like to only retain the first entry within Group_ID. Where MU is <= 1, I would like to keep all entries for that group. Thus,
Desired result:
Group_ID Timestamp MU
337 A 1
337 A 1
201 B 2
470 C 4
999 E 5
I've made many attempts, the closest being the example below. However, this solution is wrong because it excludes all entries where MU <= 1.
Best attempt:
mydf <- mydf[(mydf$MU >= 1),] %>%
group_by(Group_ID) %>%
slice(1:1)
Returns undesired result (all MU <= 1 excluded rather than retained):
Group_ID Timestamp MU
201 B 2
337 A 1
470 C 4
999 E 5
I'm surprised this attempt doesn't work, what's it missing? I've also tried ifelse statements. Many thanks in advance
mydf %>%
group_by(Group_ID) %>%
filter(cumsum(MU > 1) <= 1) %>%
ungroup()
## A tibble: 5 x 3
# Group_ID Timestamp MU
# <fct> <fct> <dbl>
#1 337 A 1
#2 337 A 1
#3 201 B 2
#4 470 C 4
#5 999 E 5
Base R equivalent would be
mydf[with(mydf, ave(MU > 1, Group_ID, FUN = cumsum) <= 1),]
We can do a group by 'Group_ID' and slice based on whether there are any element in 'MU' greater than 1
library(dplyr)
mydf %>%
group_by(Group_ID = factor(Group_ID, levels = unique(Group_ID))) %>%
slice(if(any(MU> 1)) 1 else row_number() )
# A tibble: 5 x 3
# Groups: Group_ID [4]
# Group_ID Timestamp MU
# <fct> <fct> <dbl>
#1 337 A 1
#2 337 A 1
#3 201 B 2
#4 470 C 4
#5 999 E 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With