Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do vlookup and fill down (like in Excel) in R?

Tags:

lookup

r

I have a dataset about 105000 rows and 30 columns. I have a categorical variable that I would like to assign it to a number. In Excel, I would probably do something with VLOOKUP and fill.

How would I go about doing the same thing in R?

Essentially, what I have is a HouseType variable, and I need to calculate the HouseTypeNo. Here are some sample data:

HouseType HouseTypeNo Semi            1 Single          2 Row             3 Single          2 Apartment       4 Apartment       4 Row             3 
like image 598
user2142810 Avatar asked Mar 08 '13 21:03

user2142810


People also ask

How do you do a VLOOKUP equivalent in R?

Method 2: Using dplyr To Perform VLOOKUP We can use the inner join function of the dplyr library in R to perform similar to the VLOOKUP function.

How do I lookup a value in R?

If you want to match approximately (perform a lookup), R has a function called findInterval , which (as the name implies) will find the interval / bin that contains your continuous numeric value. However, let's say that you want to findInterval for several values. You could write a loop or use an apply function.


2 Answers

If I understand your question correctly, here are four methods to do the equivalent of Excel's VLOOKUP and fill down using R:

# load sample data from Q hous <- read.table(header = TRUE,                     stringsAsFactors = FALSE,  text="HouseType HouseTypeNo Semi            1 Single          2 Row             3 Single          2 Apartment       4 Apartment       4 Row             3")  # create a toy large table with a 'HouseType' column  # but no 'HouseTypeNo' column (yet) largetable <- data.frame(HouseType = as.character(sample(unique(hous$HouseType), 1000, replace = TRUE)), stringsAsFactors = FALSE)  # create a lookup table to get the numbers to fill # the large table lookup <- unique(hous)   HouseType HouseTypeNo 1      Semi           1 2    Single           2 3       Row           3 5 Apartment           4 

Here are four methods to fill the HouseTypeNo in the largetable using the values in the lookup table:

First with merge in base:

# 1. using base  base1 <- (merge(lookup, largetable, by = 'HouseType')) 

A second method with named vectors in base:

# 2. using base and a named vector housenames <- as.numeric(1:length(unique(hous$HouseType))) names(housenames) <- unique(hous$HouseType)  base2 <- data.frame(HouseType = largetable$HouseType,                     HouseTypeNo = (housenames[largetable$HouseType])) 

Third, using the plyr package:

# 3. using the plyr package library(plyr) plyr1 <- join(largetable, lookup, by = "HouseType") 

Fourth, using the sqldf package

# 4. using the sqldf package library(sqldf) sqldf1 <- sqldf("SELECT largetable.HouseType, lookup.HouseTypeNo FROM largetable INNER JOIN lookup ON largetable.HouseType = lookup.HouseType") 

If it's possible that some house types in largetable do not exist in lookup then a left join would be used:

sqldf("select * from largetable left join lookup using (HouseType)") 

Corresponding changes to the other solutions would be needed too.

Is that what you wanted to do? Let me know which method you like and I'll add commentary.

like image 105
Ben Avatar answered Oct 22 '22 11:10

Ben


I think you can also use match():

largetable$HouseTypeNo <- with(lookup,                      HouseTypeNo[match(largetable$HouseType,                                        HouseType)]) 

This still works if I scramble the order of lookup.

like image 26
Ben Bolker Avatar answered Oct 22 '22 10:10

Ben Bolker