Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Check if value from dataframe is within range other dataframe

Tags:

r

I am looking for a way to look up infromation from 1 dataframe in another dataframe, get a value from that other dataframe and pass it back to the first frame..

example data:

I've got a dataframe named "x"

x <- structure(list(from = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L
), to = c(2L, 3L, 4L, 5L, 6L, 2L, 3L, 4L, 5L, 6L), number = c(30, 
30, 30, 33, 34, 35, 36, 37, 38, 39), name = c("region 1", "region 2", 
"region 3", "region 4", "region 5", "region 6", "region 7", "region 8", 
"region 9", "region 10")), .Names = c("from", "to", "number", 
"name"), row.names = c(NA, -10L), class = "data.frame")

#   from to number      name
#1     1  2     30  region 1
#2     2  3     30  region 2
#3     3  4     30  region 3
#4     4  5     33  region 4
#5     5  6     34  region 5
#6     1  2     35  region 6
#7     2  3     36  region 7
#8     3  4     37  region 8
#9     4  5     38  region 9
#10    5  6     39 region 10

This dataframe holds information about certain regions (1-10)

I've got another dataframe "y"

y <- structure(list(location = c(1.5, 2.8, 10, 3.5, 2), id_number = 
c(30, 30, 38, 40, 36)), .Names = c("location", "id_number"), row.names 
= c(NA, -5L), class = "data.frame")

#  location id_number
#1      1.5        30
#2      2.8        30
#3     10.0        38
#4      3.5        40
#5      2.0        36

This one containt information about locations.

What I need is a function (or command, or whatever I can throw at R ;-) ) that: for every row in y: looks up if the y$location fits between x$from and x$to AND y$id_number == x$number. If a match is found (a location y can only fall in 1 row of x, or in 0. it is impossible for y to exist in two rows in y), return x$name to a new column in y, named "name

desired output:

#  location id_number     name
#1      1.5        30 region 1
#2      2.8        30 region 2
#3     10.0        38     <NA>
#4      3.5        40     <NA>
#5      2.0        36 region 7

I'm pretty new to R, so my first idea was to use for-loops to tackle this problem (as I'm used to do in VB). But then I thought: "noooooo", I have to verctorise it, like all the people are telling me good R-programmers do ;-)

So I came up with a function, and called it with adply (from the plyr-package). Problem is: It does not work, throws me an error I don't understand, and now I'm stuck...

Can anyone point me in the right direction?

require("dplyr")

getValue <- function(y, x) {
 tmp <- x %>%
   filter(from <= y$location, to > y$location, number == y$id_number)
 return(tmp$name)
}

y["name"] <- adply(y, 1, getValue, x=x)
like image 523
Wimpel Avatar asked Sep 16 '16 17:09

Wimpel


3 Answers

Here's a simple base method that uses the OP's logic:

f <- function(vec, id) {
  if(length(.x <- which(vec >= x$from & vec <= x$to & id == x$number))) .x else NA
}
y$name <- x$name[mapply(f, y$location, y$id_number)]
y
#  location id_number     name
#1      1.5        30 region 1
#2      2.8        30 region 2
#3     10.0        38     <NA>
#4      3.5        40     <NA>
#5      2.0        36 region 7
like image 161
Pierre L Avatar answered Sep 28 '22 07:09

Pierre L


Since you want to match the columns of id_number and number, you can join x and y on the columns and then mutate the name to NA if the location doesn't fall between from and to, here is a dplyr option:

library(dplyr)
y %>% left_join(x, by = c("id_number" = "number")) %>% 
      mutate(name = if_else(location >= from & location <= to, as.character(name), NA_character_)) %>% 
      select(-from, -to) %>% arrange(name) %>% 
      distinct(location, id_number, .keep_all = T)

#   location id_number     name
# 1      1.5        30 region 1
# 2      2.8        30 region 2
# 3      2.0        36 region 7
# 4     10.0        38     <NA>
# 5      3.5        40     <NA>
like image 28
Psidom Avatar answered Sep 28 '22 09:09

Psidom


Another base method (mostly):

# we need this for the last line - if you don't use magrittr, just wrap the sapply around the lapply
library(magrittr)

# get a list of vectors where each item is whether an item's location in y is ok in each to/from in x
locationok  <- lapply(y$location, function(z) z >= x$from & z <= x$to)

# another list of logical vectors indicating whether y's location matches the number in x
idok  <- lapply(y$id_number, function(z) z== x$number)

# combine the two list and use the combined vectors as an index on x$name
lapply(1:nrow(y), function(i) {
      x$name[ locationok[[i]] & idok[[i]]  ]
}) %>% 
# replace zero length strings with NA values
sapply( function(x) ifelse(length(x) == 0, NA, x)
like image 36
crazybilly Avatar answered Sep 28 '22 08:09

crazybilly