Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional subsetting by group

Tags:

r

Example data:

mydf<-data.frame(Group_ID=c("337", "337", "201", "201", "470", "470", "999", "999"), 
                              Timestamp=c("A", "A", "B", "B", "C", "D", "E", "F"), 
                              MU=as.numeric(c("1", "1", "2", "3", "4", "4", "5", "6")))

Gives:

    Group_ID Timestamp MU
         337         A  1
         337         A  1
         201         B  2
         201         B  3
         470         C  4
         470         D  4
         999         E  5
         999         F  6

Where MU is greater than 1, I would like to only retain the first entry within Group_ID. Where MU is <= 1, I would like to keep all entries for that group. Thus,

Desired result:

    Group_ID Timestamp MU
         337         A  1
         337         A  1
         201         B  2
         470         C  4
         999         E  5

I've made many attempts, the closest being the example below. However, this solution is wrong because it excludes all entries where MU <= 1.

Best attempt:

mydf <- mydf[(mydf$MU >= 1),] %>%            
  group_by(Group_ID) %>% 
  slice(1:1)  

Returns undesired result (all MU <= 1 excluded rather than retained):

Group_ID Timestamp    MU
     201         B     2
     337         A     1
     470         C     4
     999         E     5

I'm surprised this attempt doesn't work, what's it missing? I've also tried ifelse statements. Many thanks in advance

like image 405
Emily Avatar asked Apr 08 '26 20:04

Emily


2 Answers

mydf %>%
    group_by(Group_ID) %>%
    filter(cumsum(MU > 1) <= 1) %>%
    ungroup()
## A tibble: 5 x 3
#  Group_ID Timestamp    MU
#  <fct>    <fct>     <dbl>
#1 337      A             1
#2 337      A             1
#3 201      B             2
#4 470      C             4
#5 999      E             5

Base R equivalent would be

mydf[with(mydf, ave(MU > 1, Group_ID, FUN = cumsum) <= 1),]
like image 195
d.b Avatar answered Apr 12 '26 10:04

d.b


We can do a group by 'Group_ID' and slice based on whether there are any element in 'MU' greater than 1

library(dplyr)
mydf %>% 
  group_by(Group_ID = factor(Group_ID, levels = unique(Group_ID))) %>%
  slice(if(any(MU> 1)) 1 else row_number() )
# A tibble: 5 x 3
# Groups:   Group_ID [4]
#  Group_ID Timestamp    MU
#  <fct>    <fct>     <dbl>
#1 337      A             1
#2 337      A             1
#3 201      B             2
#4 470      C             4
#5 999      E             5
like image 36
akrun Avatar answered Apr 12 '26 08:04

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!