I have a vector of numeric elements, and a dataframe with two columns that define the start and end points of intervals. Each row in the dataframe is one interval. I want to find out which interval each element in the vector belongs to.
Here's some example data:
# Find which interval that each element of the vector belongs in
library(tidyverse)
elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1)
intervals <- frame_data(~phase, ~start, ~end,
"a", 0, 0.5,
"b", 1, 1.9,
"c", 2, 2.5)
The same example data for those who object to the tidyverse:
elements <- c(0.1, 0.2, 0.5, 0.9, 1.1, 1.9, 2.1)
intervals <- structure(list(phase = c("a", "b", "c"),
start = c(0, 1, 2),
end = c(0.5, 1.9, 2.5)),
.Names = c("phase", "start", "end"),
row.names = c(NA, -3L),
class = "data.frame")
Here's one way to do it:
library(intrval)
phases_for_elements <-
map(elements, ~.x %[]% data.frame(intervals[, c('start', 'end')])) %>%
map(., ~unlist(intervals[.x, 'phase']))
Here's the output:
[[1]]
phase
"a"
[[2]]
phase
"a"
[[3]]
phase
"a"
[[4]]
character(0)
[[5]]
phase
"b"
[[6]]
phase
"b"
[[7]]
phase
"c"
But I'm looking for a simpler method with less typing. I've seen findInterval
in related questions, but I'm not sure how I can use it in this situation.
David Arenburg's mention of non-equi joins was very helpful for understanding what general kind of problem this is (thanks!). I can see now that it's not implemented for dplyr. Thanks to this answer, I see that there is a fuzzyjoin package that can do it in the same idiom. But it's barely any simpler than my map
solution above (though more readable, in my view), and doesn't hold a candle to thelatemail's cut
answer for brevity.
For my example above, the fuzzyjoin solution would be
library(fuzzyjoin)
library(tidyverse)
fuzzy_left_join(data.frame(elements), intervals,
by = c("elements" = "start", "elements" = "end"),
match_fun = list(`>=`, `<=`)) %>%
distinct()
Which gives:
elements phase start end
1 0.1 a 0 0.5
2 0.2 a 0 0.5
3 0.5 a 0 0.5
4 0.9 <NA> NA NA
5 1.1 b 1 1.9
6 1.9 b 1 1.9
7 2.1 c 2 2.5
Here is kind of a "one-liner" which (mis-)uses foverlaps
from the data.table
package but David's non-equi join is still more concise:
library(data.table) #v1.10.0
foverlaps(data.table(start = elements, end = elements),
setDT(intervals, key = c("start", "end")))
# phase start end i.start i.end
#1: a 0 0.5 0.1 0.1
#2: a 0 0.5 0.2 0.2
#3: a 0 0.5 0.5 0.5
#4: NA NA NA 0.9 0.9
#5: b 1 1.9 1.1 1.1
#6: b 1 1.9 1.9 1.9
#7: c 2 2.5 2.1 2.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With