I am looking for a way to look up infromation from 1 dataframe in another dataframe, get a value from that other dataframe and pass it back to the first frame..
example data:
I've got a dataframe named "x"
x <- structure(list(from = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L
), to = c(2L, 3L, 4L, 5L, 6L, 2L, 3L, 4L, 5L, 6L), number = c(30,
30, 30, 33, 34, 35, 36, 37, 38, 39), name = c("region 1", "region 2",
"region 3", "region 4", "region 5", "region 6", "region 7", "region 8",
"region 9", "region 10")), .Names = c("from", "to", "number",
"name"), row.names = c(NA, -10L), class = "data.frame")
# from to number name
#1 1 2 30 region 1
#2 2 3 30 region 2
#3 3 4 30 region 3
#4 4 5 33 region 4
#5 5 6 34 region 5
#6 1 2 35 region 6
#7 2 3 36 region 7
#8 3 4 37 region 8
#9 4 5 38 region 9
#10 5 6 39 region 10
This dataframe holds information about certain regions (1-10)
I've got another dataframe "y"
y <- structure(list(location = c(1.5, 2.8, 10, 3.5, 2), id_number =
c(30, 30, 38, 40, 36)), .Names = c("location", "id_number"), row.names
= c(NA, -5L), class = "data.frame")
# location id_number
#1 1.5 30
#2 2.8 30
#3 10.0 38
#4 3.5 40
#5 2.0 36
This one containt information about locations.
What I need is a function (or command, or whatever I can throw at R ;-) ) that: for every row in y: looks up if the y$location fits between x$from and x$to AND y$id_number == x$number. If a match is found (a location y can only fall in 1 row of x, or in 0. it is impossible for y to exist in two rows in y), return x$name to a new column in y, named "name
desired output:
# location id_number name
#1 1.5 30 region 1
#2 2.8 30 region 2
#3 10.0 38 <NA>
#4 3.5 40 <NA>
#5 2.0 36 region 7
I'm pretty new to R, so my first idea was to use for-loops to tackle this problem (as I'm used to do in VB). But then I thought: "noooooo", I have to verctorise it, like all the people are telling me good R-programmers do ;-)
So I came up with a function, and called it with adply (from the plyr-package). Problem is: It does not work, throws me an error I don't understand, and now I'm stuck...
Can anyone point me in the right direction?
require("dplyr")
getValue <- function(y, x) {
tmp <- x %>%
filter(from <= y$location, to > y$location, number == y$id_number)
return(tmp$name)
}
y["name"] <- adply(y, 1, getValue, x=x)
Here's a simple base method that uses the OP's logic:
f <- function(vec, id) {
if(length(.x <- which(vec >= x$from & vec <= x$to & id == x$number))) .x else NA
}
y$name <- x$name[mapply(f, y$location, y$id_number)]
y
# location id_number name
#1 1.5 30 region 1
#2 2.8 30 region 2
#3 10.0 38 <NA>
#4 3.5 40 <NA>
#5 2.0 36 region 7
Since you want to match the columns of id_number
and number
, you can join x
and y
on the columns and then mutate the name to NA
if the location doesn't fall between from
and to
, here is a dplyr
option:
library(dplyr)
y %>% left_join(x, by = c("id_number" = "number")) %>%
mutate(name = if_else(location >= from & location <= to, as.character(name), NA_character_)) %>%
select(-from, -to) %>% arrange(name) %>%
distinct(location, id_number, .keep_all = T)
# location id_number name
# 1 1.5 30 region 1
# 2 2.8 30 region 2
# 3 2.0 36 region 7
# 4 10.0 38 <NA>
# 5 3.5 40 <NA>
Another base method (mostly):
# we need this for the last line - if you don't use magrittr, just wrap the sapply around the lapply
library(magrittr)
# get a list of vectors where each item is whether an item's location in y is ok in each to/from in x
locationok <- lapply(y$location, function(z) z >= x$from & z <= x$to)
# another list of logical vectors indicating whether y's location matches the number in x
idok <- lapply(y$id_number, function(z) z== x$number)
# combine the two list and use the combined vectors as an index on x$name
lapply(1:nrow(y), function(i) {
x$name[ locationok[[i]] & idok[[i]] ]
}) %>%
# replace zero length strings with NA values
sapply( function(x) ifelse(length(x) == 0, NA, x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With