Check if column value is in between (range) of two other column values

Question

I have a data frame that looks like this (Dataframe X):

id  number  found
1   5225    NA
2   2222    NA
3   3121    NA

I have another data frame that looks like this (Dataframe Y):

id  number1  number2    
1   4000     6000
3   2500     3300
3   7000     8000

What I want to do is this: For each value in the Dataframe X "number" column, search if it is equal to or between ANY of the "number1" and "number2" pair values of Dataframe Y. Additionally, for this "number1" and "number2" pair values, its respective "id" must match the "id" in Dataframe X. If this is all true, then I want to insert a "YES in the "found" column of the respective row in Dataframe X:

id  number  found
1   5225    YES
2   2222    NA
3   3121    YES

How would I go about doing this? Thanks for the help.

akrun · Accepted Answer

Here is an option using fuzzy_join

library(fuzzy_join)
library(dplyr)
fuzzy_left_join(X, Y[-1], by = c("number" = "number1", "number" = "number2"), 
     match_fun  =list(`>=`, `<=`)) %>% 
    mutate(found = c(NA, "YES")[(!is.na(number1)) + 1]) %>% 
    select(names(X))
#    id number found
#1  1   5225   YES
#2  2   2222  <NA>
#3  3   3121   YES

Or another option is a non-equi join with data.table

library(data.table)
setDT(X)[, found := NULL]
X[Y, found := "YES", on = .(number >= number1, number <= number2)]
X
#   id number found
#1:  1   5225   YES
#2:  2   2222  <NA>
#3:  3   3121   YES

data

X <- structure(list(id = 1:3, number = c(5225L, 2222L, 3121L), found = c(NA, 
  NA, NA)), class = "data.frame", row.names = c(NA, -3L))

Y <- structure(list(id = 1:3, number1 = c(4000L, 2500L, 7000L), number2 = c(6000L, 
    3300L, 8000L)), class = "data.frame", row.names = c(NA, -3L))

Ronak Shah · Answer

We can loop over each x$number using sapply and check if it lies in range of any of y$number1 and y$number2 and give the value accordingly.

x$found <- ifelse(sapply(x$number, function(p) 
                 any(y$number1 <= p & y$number2 >= p)),"YES", NA)
x

#  id number found
#1  1   5225   YES
#2  2   2222  <NA>
#3  3   3121   YES

Using the same logic but with replace

x$found <- replace(x$found, 
         sapply(x$number, function(p) any(y$number1 <= p & y$number2 >= p)), "YES")

EDIT

If we want to also compare the id value we could do

x$found <- ifelse(sapply(seq_along(x$number), function(i) {
           inds <- y$number1 <= x$number[i] & y$number2 >= x$number[i]
           any(inds) & (x$id[i] == y$id[which.max(inds)])
           }), "YES", NA)

x$found
#[1] "YES" NA    "YES"

Calum You · Answer

Using tidyverse functions, especially map_chr to iterate over each number:

library(tidyverse)
tbl1 <- read_table2(
"id   number  found
1    5225     NA
2    2222     NA
3    3121     NA"
)
tbl2 <- read_table2(
"id  number1  number2
1    4000   6000
2    2500   3300
3    7000   8000"
)

tbl1 %>%
  mutate(found = map_chr(
    .x = number,
    .f = ~ if_else(
      condition = any(.x > tbl2$number1 & .x < tbl2$number2),
      true = "YES",
      false = NA_character_
    )
  ))
#> # A tibble: 3 x 3
#>      id number found
#>   <int>  <int> <chr>
#> 1     1   5225 YES  
#> 2     2   2222 <NA> 
#> 3     3   3121 YES

Created on 2018-10-18 by the reprex package (v0.2.0).

Tim Biegeleisen · Answer

Using sqldf:

library(sqldf)
sql <- "SELECT DISTINCT x.id, x.number, "
sql <- paste0(sql, "CASE WHEN y.id IS NOT NULL THEN 'YES' END AS found ")
sql <- paste0(sql, "FROM X x LEFT JOIN Y y ON x.number BETWEEN y.number1 AND y.number2")
X <- sqldf(sql)

enter image description here

Check if column value is in between (range) of two other column values

Tags:

r

Tony

4 Answers

data

akrun

Ronak Shah

Calum You

Tim Biegeleisen

Recent Activity

Donate For Us

Check if column value is in between (range) of two other column values

Tags:

r

Tony

4 Answers

data

akrun

Ronak Shah

Calum You

Tim Biegeleisen

Related questions

Recent Activity

Donate For Us