Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to filter rows between two specific values

Tags:

r

I need help filtering the following dataframe (this is a simple example):

mx = as.data.frame(cbind(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
                         c(F, T, F, F, F, F, T, F,T)) )
colnames(mx) = c("mutation", "distance")
mx
  mutation distance
1        -    FALSE
2        -     TRUE
3        -    FALSE
4        -    FALSE
5 mutation    FALSE
6        +    FALSE
7        +     TRUE
8        +    FALSE
9        +     TRUE

I need to filter based on the second column (distance), so that it looks like this:

  mutation distance
3        -    FALSE
4        -    FALSE
5 mutation    FALSE
6        +    FALSE

I need to remove all rows until the last TRUE that is before the row with the mx$mutation = mutation value (so rows 1 and 2), and all rows after the first TRUE that occurs after mx$mutation = mutation (so row 7 and beyond).

like image 457
Haloom Avatar asked Jan 16 '18 06:01

Haloom


1 Answers

We can create a grouping variable by doing the cumulative sum of the logical column ('distance') and then do the filter

library(dplyr)
mx %>%
  group_by(grp = cumsum(distance)) %>% 
  filter(any(mutation == "mutation") & !distance) %>%
  ungroup %>% 
  select(-grp)
# A tibble: 4 x 2
# mutation distance
#  <fctr>   <lgl>   
#1 -        F       
#2 -        F       
#3 mutation F       
#4 +        F       

NOTE: We can directly create a data.frame with data.frame. No need for cbind and it would adversely affect the type of the columns as cbind converts to a matrix and matrix can hold only a single type

data

mx = data.frame(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
                      c(F, T, F, F, F, F, T, F,T)) 
like image 73
akrun Avatar answered Nov 15 '22 03:11

akrun