I need help filtering the following dataframe (this is a simple example):
mx = as.data.frame(cbind(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
c(F, T, F, F, F, F, T, F,T)) )
colnames(mx) = c("mutation", "distance")
mx
mutation distance
1 - FALSE
2 - TRUE
3 - FALSE
4 - FALSE
5 mutation FALSE
6 + FALSE
7 + TRUE
8 + FALSE
9 + TRUE
I need to filter based on the second column (distance), so that it looks like this:
mutation distance
3 - FALSE
4 - FALSE
5 mutation FALSE
6 + FALSE
I need to remove all rows until the last TRUE
that is before the row with the mx$mutation = mutation
value (so rows 1 and 2), and all rows after the first TRUE
that occurs after mx$mutation = mutation
(so row 7 and beyond).
We can create a grouping variable by doing the cumulative sum of the logical column ('distance') and then do the filter
library(dplyr)
mx %>%
group_by(grp = cumsum(distance)) %>%
filter(any(mutation == "mutation") & !distance) %>%
ungroup %>%
select(-grp)
# A tibble: 4 x 2
# mutation distance
# <fctr> <lgl>
#1 - F
#2 - F
#3 mutation F
#4 + F
NOTE: We can directly create a data.frame
with data.frame
. No need for cbind
and it would adversely affect the type of the columns as cbind
converts to a matrix
and matrix
can hold only a single type
mx = data.frame(c("-", "-", "-", "-", "mutation", "+", "+", "+", "+") ,
c(F, T, F, F, F, F, T, F,T))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With