Translate a vector of values using a key value mapping in R (equivalent to a HashMap)

I need to translate the values in a vector according to a mapping of key value pairs:

vector <- c("dog","ant","eagle","ant","eagle","parrot") 

  "dog"  "ant"  "eagle"  "ant"  "eagle"  "parrot"

mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),value=c("mammal","mammal","mammal","insect","bird","bird"))

  key      value
  dog      mammal
  cat      mammal
  elephant mammal
  ant      insect
  parrot   bird
  eagle    bird

The desired output would be like this:

output <- ("mammal", "insect", "bird", "insect", "bird", "bird") 

In the real dataset I have to translate ~10000 input vectors of an average length of ~15 and the mapping data-frame is in the range of a million keys with about 100000 unique classes on the side of the values.

The problem itself appears rather basic to me, but the bottleneck is runtime. In other programming languages you would probably use a HashMap for the mapping and then loop through the vector. Any solution in R I could come up with so far is orders of magnitude slower than a simple HashMap-based one in Java or Python (see comments below).

Is there a more efficient data structure to store the mapping than a data frame?

What would be the most runtime-efficient solution to this problem in R?

1 Answers

There is a package called hashmap which is perfect for this:


hash_lookup = hashmap(mapping$key, mapping$value)

output = hash_lookup[[vector]]


> hash_lookup
## (character) => (character)
##       [cat] => [mammal]   
##  [elephant] => [mammal]   
##       [ant] => [insect]   
##       [dog] => [mammal]   
##     [eagle] => [bird]     
##    [parrot] => [bird]     

> output
[1] "mammal" "insect" "bird"   "insect" "bird"   "bird"


vector <- c("dog","ant","eagle","ant","eagle","parrot")

mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),
                      stringsAsFactors = FALSE)


Have to test this on a bigger dataset, but this method should be very fast since it is implemented with Rcpp internally.

