Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace values in a dataframe based on lookup table

I am having some trouble replacing values in a dataframe. I would like to replace values based on a separate table. Below is an example of what I am trying to do.

I have a table where every row is a customer and every column is an animal they purchased. Lets call this dataframe table.

> table
#       P1     P2     P3
# 1    cat lizard parrot
# 2 lizard parrot    cat
# 3 parrot    cat lizard

I also have a table that I will reference called lookUp.

> lookUp
#      pet   class
# 1    cat  mammal
# 2 lizard reptile
# 3 parrot    bird

What I want to do is create a new table called new with a function replaces all values in table with the class column in lookUp. I tried this myself using an lapply function, but I got the following warnings.

new <- as.data.frame(lapply(table, function(x) {
  gsub('.*', lookUp[match(x, lookUp$pet) ,2], x)}), stringsAsFactors = FALSE)

Warning messages:
1: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
  argument 'replacement' has length > 1 and only the first element will be used
2: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
  argument 'replacement' has length > 1 and only the first element will be used
3: In gsub(".*", lookUp[match(x, lookUp$pet), 2], x) :
  argument 'replacement' has length > 1 and only the first element will be used

Any ideas on how to make this work?

like image 261
jbunk Avatar asked Feb 25 '16 19:02

jbunk


5 Answers

You posted an approach in your question which was not bad. Here's a smiliar approach:

new <- df  # create a copy of df
# using lapply, loop over columns and match values to the look up table. store in "new".
new[] <- lapply(df, function(x) look$class[match(x, look$pet)])

An alternative approach which will be faster is:

new <- df
new[] <- look$class[match(unlist(df), look$pet)]

Note that I use empty brackets ([]) in both cases to keep the structure of new as it was (a data.frame).

(I'm using df instead of table and look instead of lookup in my answer)

like image 158
talat Avatar answered Oct 20 '22 16:10

talat


Another options is a combination of tidyr and dplyr

library(dplyr)
library(tidyr)
table %>%
   gather(key = "pet") %>%
   left_join(lookup, by = "pet") %>%
   spread(key = pet, value = class)
like image 20
Thierry Avatar answered Oct 20 '22 15:10

Thierry


Anytime you have two separate data.frames and are trying to bring info from one to the other, the answer is to merge.

Everyone has their own favorite merge method in R. Mine is data.table.

Also, since you want to do this to many columns, it'll be faster to melt and dcast -- rather than loop over columns, apply it once to a reshaped table, then reshape again.

library(data.table)

#the row names will be our ID variable for melting
setDT(table, keep.rownames = TRUE) 
setDT(lookUp)

#now melt, merge, recast
# melting (reshape wide to long)
table[ , melt(.SD, id.vars = 'rn')     
       # merging
       ][lookup, new_value := i.class, on = c(value = 'pet') 
         #reform back to original shape
         ][ , dcast(.SD, rn ~ variable, value.var = 'new_value')]
#    rn      P1      P2      P3
# 1:  1  mammal reptile    bird
# 2:  2 reptile    bird  mammal
# 3:  3    bird  mammal reptile

In case you find the dcast/melt bit a bit intimidating, here's an approach that just loops over columns; dcast/melt is simply sidestepping the loop for this problem.

setDT(table) #don't need row names this time
setDT(lookUp)

sapply(names(table), #(or to whichever are the relevant columns)
       function(cc) table[lookUp, (cc) := #merge, replace
                            #need to pass a _named_ vector to 'on', so use setNames
                            i.class, on = setNames("pet", cc)])
like image 26
MichaelChirico Avatar answered Oct 20 '22 16:10

MichaelChirico


Make a named vector, and loop through every column and match, see:

# make lookup vector with names
lookUp1 <- setNames(as.character(lookUp$class), lookUp$pet)
lookUp1    
#      cat    lizard    parrot 
# "mammal" "reptile"    "bird" 

# match on names get values from lookup vector
res <- data.frame(lapply(df1, function(i) lookUp1[i]))
# reset rownames
rownames(res) <- NULL

# res
#        P1      P2      P3
# 1  mammal reptile    bird
# 2 reptile    bird  mammal
# 3    bird  mammal reptile

data

df1 <- read.table(text = "
       P1     P2     P3
 1    cat lizard parrot
 2 lizard parrot    cat
 3 parrot    cat lizard", header = TRUE)

lookUp <- read.table(text = "
      pet   class
 1    cat  mammal
 2 lizard reptile
 3 parrot    bird", header = TRUE)
like image 39
zx8754 Avatar answered Oct 20 '22 15:10

zx8754


I did it using the factor built-in.

table$P1 <- factor(table$P1, levels=lookUp$pet, labels=lookUp$class)
table$P2 <- factor(table$P2, levels=lookUp$pet, labels=lookUp$class)
table$P3 <- factor(table$P3, levels=lookUp$pet, labels=lookUp$class)
like image 4
ulidtko Avatar answered Oct 20 '22 15:10

ulidtko