Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if column value is in between (range) of two other column values

Tags:

r

I have a data frame that looks like this (Dataframe X):

id  number  found
1   5225    NA
2   2222    NA
3   3121    NA

I have another data frame that looks like this (Dataframe Y):

id  number1  number2    
1   4000     6000
3   2500     3300
3   7000     8000

What I want to do is this: For each value in the Dataframe X "number" column, search if it is equal to or between ANY of the "number1" and "number2" pair values of Dataframe Y. Additionally, for this "number1" and "number2" pair values, its respective "id" must match the "id" in Dataframe X. If this is all true, then I want to insert a "YES in the "found" column of the respective row in Dataframe X:

id  number  found
1   5225    YES
2   2222    NA
3   3121    YES

How would I go about doing this? Thanks for the help.

like image 795
Tony Avatar asked Oct 19 '18 03:10

Tony


4 Answers

Here is an option using fuzzy_join

library(fuzzy_join)
library(dplyr)
fuzzy_left_join(X, Y[-1], by = c("number" = "number1", "number" = "number2"), 
     match_fun  =list(`>=`, `<=`)) %>% 
    mutate(found = c(NA, "YES")[(!is.na(number1)) + 1]) %>% 
    select(names(X))
#    id number found
#1  1   5225   YES
#2  2   2222  <NA>
#3  3   3121   YES

Or another option is a non-equi join with data.table

library(data.table)
setDT(X)[, found := NULL]
X[Y, found := "YES", on = .(number >= number1, number <= number2)]
X
#   id number found
#1:  1   5225   YES
#2:  2   2222  <NA>
#3:  3   3121   YES

data

X <- structure(list(id = 1:3, number = c(5225L, 2222L, 3121L), found = c(NA, 
  NA, NA)), class = "data.frame", row.names = c(NA, -3L))

Y <- structure(list(id = 1:3, number1 = c(4000L, 2500L, 7000L), number2 = c(6000L, 
    3300L, 8000L)), class = "data.frame", row.names = c(NA, -3L))
like image 141
akrun Avatar answered Nov 12 '22 20:11

akrun


We can loop over each x$number using sapply and check if it lies in range of any of y$number1 and y$number2 and give the value accordingly.

x$found <- ifelse(sapply(x$number, function(p) 
                 any(y$number1 <= p & y$number2 >= p)),"YES", NA)
x

#  id number found
#1  1   5225   YES
#2  2   2222  <NA>
#3  3   3121   YES

Using the same logic but with replace

x$found <- replace(x$found, 
         sapply(x$number, function(p) any(y$number1 <= p & y$number2 >= p)), "YES")

EDIT

If we want to also compare the id value we could do

x$found <- ifelse(sapply(seq_along(x$number), function(i) {
           inds <- y$number1 <= x$number[i] & y$number2 >= x$number[i]
           any(inds) & (x$id[i] == y$id[which.max(inds)])
           }), "YES", NA)

x$found
#[1] "YES" NA    "YES"
like image 30
Ronak Shah Avatar answered Nov 12 '22 20:11

Ronak Shah


Using tidyverse functions, especially map_chr to iterate over each number:

library(tidyverse)
tbl1 <- read_table2(
"id   number  found
1    5225     NA
2    2222     NA
3    3121     NA"
)
tbl2 <- read_table2(
"id  number1  number2
1    4000   6000
2    2500   3300
3    7000   8000"
)

tbl1 %>%
  mutate(found = map_chr(
    .x = number,
    .f = ~ if_else(
      condition = any(.x > tbl2$number1 & .x < tbl2$number2),
      true = "YES",
      false = NA_character_
    )
  ))
#> # A tibble: 3 x 3
#>      id number found
#>   <int>  <int> <chr>
#> 1     1   5225 YES  
#> 2     2   2222 <NA> 
#> 3     3   3121 YES

Created on 2018-10-18 by the reprex package (v0.2.0).

like image 6
Calum You Avatar answered Nov 12 '22 18:11

Calum You


Using sqldf:

library(sqldf)
sql <- "SELECT DISTINCT x.id, x.number, "
sql <- paste0(sql, "CASE WHEN y.id IS NOT NULL THEN 'YES' END AS found ")
sql <- paste0(sql, "FROM X x LEFT JOIN Y y ON x.number BETWEEN y.number1 AND y.number2")
X <- sqldf(sql)

enter image description here

like image 4
Tim Biegeleisen Avatar answered Nov 12 '22 19:11

Tim Biegeleisen