This is a seemingly simple R question, but I don't see an exact answer here. I have a data frame (alldata) that looks like this: <pre class="prettyprint"><code>Case zip market 1 44485 NA 2 44488 NA 3 43210 NA </code></pre> There are over 3.5 million records. Then, I have a second data frame, 'zipcodes'. <pre class="prettyprint"><code>market zip 1 44485 1 44486 1 44488 ... ... (100 zips in market 1) 2 43210 2 43211 ... ... (100 zips in market 2, etc.) </code></pre> I want to find the correct value for alldata$market for each case based on alldata$zip matching the appropriate value in the zipcode data frame. I'm just looking for the right syntax, and assistance is much appreciated, as usual.

Here's the <code>dplyr</code> way of doing it: <pre class="prettyprint"><code>library(tidyverse) alldata %>% select(-market) %>% left_join(zipcodes, by="zip") </code></pre> which, on my machine, is roughly the same performance as <code>lookup</code>.

With such a large data set you may want the speed of an environment lookup. You can use the <code>lookup</code> function from the qdapTools package as follows: <pre class="prettyprint"><code>library(qdapTools) alldata$market <- lookup(alldata$zip, zipcodes[, 2:1]) </code></pre> Or <pre class="prettyprint"><code>alldata$zip %l% zipcodes[, 2:1] </code></pre>

Simple lookup to insert values in an R data frame

Tags:

r

lookup-tables

This is a seemingly simple R question, but I don't see an exact answer here. I have a data frame (alldata) that looks like this:

Case     zip     market
1        44485   NA
2        44488   NA
3        43210   NA

There are over 3.5 million records.

Then, I have a second data frame, 'zipcodes'.

market    zip
1         44485
1         44486
1         44488
...       ... (100 zips in market 1)
2         43210
2         43211
...       ... (100 zips in market 2, etc.)

I want to find the correct value for alldata$market for each case based on alldata$zip matching the appropriate value in the zipcode data frame. I'm just looking for the right syntax, and assistance is much appreciated, as usual.

431

asked Jul 24 '13 20:07

Dino Fire

2 Answers

Here's the dplyr way of doing it:

library(tidyverse)
alldata %>%
  select(-market) %>%
  left_join(zipcodes, by="zip")

which, on my machine, is roughly the same performance as lookup.

answered Sep 29 '22 22:09

James Brusey

With such a large data set you may want the speed of an environment lookup. You can use the lookup function from the qdapTools package as follows:

library(qdapTools)
alldata$market <- lookup(alldata$zip, zipcodes[, 2:1])

alldata$zip %l% zipcodes[, 2:1]

answered Sep 29 '22 23:09

Tyler Rinker

Related questions
                            
                                How do I get just the first quartile from a column
                            
                                styleColorBar Center and shift Left/Right dependent on Sign
                            
                                Remove 'Show Entries' in datatable
                            
                                line break and subscript in axis title using plotly in R
                            
                                Collect All user inputs throughout the Shiny App
                            
                                fread - read all columns as character
                            
                                How to filter data without losing NA rows using dplyr
                            
                                ggrepel label with transparent background but visible font
                            
                                Replace last comma in character with " &"
                            
                                How to export the definition of an R object to plain text so that others can recreate it?
                            
                                grouped bar graph
                            
                                Change column name by looking up
                            
                                Weighted Pearson's Correlation?
                            
                                Why doesn't assign() values to a list element work in R?
                            
                                Are there any guidelines for when reproducible code should be included into a publication?
                            
                                Sum by distinct column value in R
                            
                                ggplot2 custom legend shapes
                            
                                Using get() with replacement functions
                            
                                Specify different types of missing values (NAs)
                            
                                Converting a \u escaped Unicode string to ASCII

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With