Let's say I have the following data.frame, where pos is a position coordinate. I've included a variable thresh where val is greater than a given threshold t.
set.seed(123)
n <- 20
t <- 0
DF <- data.frame(pos = seq(from = 0, by = 0.3, length.out = n),
val = sample(-2:5, size = n, replace = TRUE))
DF$thresh <- DF$val > t
DF
## pos val thresh
## 1 0.0 0 FALSE
## 2 0.3 4 TRUE
## 3 0.6 1 TRUE
## 4 0.9 5 TRUE
## 5 1.2 5 TRUE
## 6 1.5 -2 FALSE
## 7 1.8 2 TRUE
## 8 2.1 5 TRUE
## 9 2.4 2 TRUE
## 10 2.7 1 TRUE
## 11 3.0 5 TRUE
## 12 3.3 1 TRUE
## 13 3.6 3 TRUE
## 14 3.9 2 TRUE
## 15 4.2 -2 FALSE
## 16 4.5 5 TRUE
## 17 4.8 -1 FALSE
## 18 5.1 -2 FALSE
## 19 5.4 0 FALSE
## 20 5.7 5 TRUE
How could I get region coordinates where val is positive i.e. in the above example:
0.3 - 1.2,
1.8 - 3.9,
4.5 - 4.5,
5.7 - 5.7
I have thought of splitting the data.frame by thresh and then accessing pos from the first and last row of each data.frame list element, but that will just combine all the TRUE and FALSE subsets together. Is there a way to convert the thresh variable into a character based on the TRUE value, and discarding the FALSE values?
split(DF, DF$thresh) # not what I want
## $`FALSE`
## pos val thresh
## 1 0.0 0 FALSE
## 6 1.5 -2 FALSE
## 15 4.2 -2 FALSE
## 17 4.8 -1 FALSE
## 18 5.1 -2 FALSE
## 19 5.4 0 FALSE
##
## $`TRUE`
## pos val thresh
## 2 0.3 4 TRUE
## 3 0.6 1 TRUE
## 4 0.9 5 TRUE
## 5 1.2 5 TRUE
## 7 1.8 2 TRUE
## 8 2.1 5 TRUE
## 9 2.4 2 TRUE
## 10 2.7 1 TRUE
## 11 3.0 5 TRUE
## 12 3.3 1 TRUE
## 13 3.6 3 TRUE
## 14 3.9 2 TRUE
## 16 4.5 5 TRUE
## 20 5.7 5 TRUE
Another clunky thing I tried was cumsum
but again it includes false rows:
split(DF, cumsum(DF$thresh == 0)) # not what I want but close to it...
## $`1`
## pos val thresh
## 1 0.0 0 FALSE
## 2 0.3 4 TRUE
## 3 0.6 1 TRUE
## 4 0.9 5 TRUE
## 5 1.2 5 TRUE
##
## $`2`
## pos val thresh
## 6 1.5 -2 FALSE
## 7 1.8 2 TRUE
## 8 2.1 5 TRUE
## 9 2.4 2 TRUE
## 10 2.7 1 TRUE
## 11 3.0 5 TRUE
## 12 3.3 1 TRUE
## 13 3.6 3 TRUE
## 14 3.9 2 TRUE
##
## $`3`
## pos val thresh
## 15 4.2 -2 FALSE
## 16 4.5 5 TRUE
##
## $`4`
## pos val thresh
## 17 4.8 -1 FALSE
##
## $`5`
## pos val thresh
## 18 5.1 -2 FALSE
##
## $`6`
## pos val thresh
## 19 5.4 0 FALSE
## 20 5.7 5 TRUE
Here is one option with data.table
. We create a grouping variable using rleid
, subset the 'pos' based on 'thresh' and split
.
DT <- setDT(DF)[,pos[thresh] ,.(gr=rleid(thresh))]
split(DT$V1, DT$gr)
#$`2`
#[1] 0.3 0.6 0.9 1.2
#$`4`
#[1] 1.8 2.1 2.4 2.7 3.0 3.3 3.6 3.9
#$`6`
#[1] 4.5
#$`8`
#[1] 5.7
Or we can use rle
from base R
to create the grouping variable and then split
based on that
gr <- inverse.rle(within.list(rle(DF$thresh), values <- seq_along(values)))
with(DF, split(pos[thresh], gr[thresh]))
Or as @thelatemail mentioned, cumsum
can also be used for grouping after subsetting using the 'thresh'.
with(DF, split(pos[thresh],cumsum(!thresh)[thresh]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With