Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple lookup to insert values in an R data frame

This is a seemingly simple R question, but I don't see an exact answer here. I have a data frame (alldata) that looks like this:

Case     zip     market
1        44485   NA
2        44488   NA
3        43210   NA

There are over 3.5 million records.

Then, I have a second data frame, 'zipcodes'.

market    zip
1         44485
1         44486
1         44488
...       ... (100 zips in market 1)
2         43210
2         43211
...       ... (100 zips in market 2, etc.)

I want to find the correct value for alldata$market for each case based on alldata$zip matching the appropriate value in the zipcode data frame. I'm just looking for the right syntax, and assistance is much appreciated, as usual.

like image 431
Dino Fire Avatar asked Jul 24 '13 20:07

Dino Fire


People also ask

How do you add values in R?

To add or insert observation/row to an existing Data Frame in R, we use rbind() function. We can add single or multiple observations/rows to a Data Frame in R using rbind() function.

Is there a lookup function in R?

If you want to match approximately (perform a lookup), R has a function called findInterval , which (as the name implies) will find the interval / bin that contains your continuous numeric value.


2 Answers

Here's the dplyr way of doing it:

library(tidyverse)
alldata %>%
  select(-market) %>%
  left_join(zipcodes, by="zip")

which, on my machine, is roughly the same performance as lookup.

like image 99
James Brusey Avatar answered Sep 29 '22 22:09

James Brusey


With such a large data set you may want the speed of an environment lookup. You can use the lookup function from the qdapTools package as follows:

library(qdapTools)
alldata$market <- lookup(alldata$zip, zipcodes[, 2:1])

Or

alldata$zip %l% zipcodes[, 2:1]
like image 41
Tyler Rinker Avatar answered Sep 29 '22 23:09

Tyler Rinker