Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Translate a vector of values using a key value mapping in R (equivalent to a HashMap)

I need to translate the values in a vector according to a mapping of key value pairs:

vector <- c("dog","ant","eagle","ant","eagle","parrot") 

  "dog"  "ant"  "eagle"  "ant"  "eagle"  "parrot"


mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),value=c("mammal","mammal","mammal","insect","bird","bird"))

  key      value
  dog      mammal
  cat      mammal
  elephant mammal
  ant      insect
  parrot   bird
  eagle    bird

The desired output would be like this:

output <- ("mammal", "insect", "bird", "insect", "bird", "bird") 

In the real dataset I have to translate ~10000 input vectors of an average length of ~15 and the mapping data-frame is in the range of a million keys with about 100000 unique classes on the side of the values.

The problem itself appears rather basic to me, but the bottleneck is runtime. In other programming languages you would probably use a HashMap for the mapping and then loop through the vector. Any solution in R I could come up with so far is orders of magnitude slower than a simple HashMap-based one in Java or Python (see comments below).

Is there a more efficient data structure to store the mapping than a data frame?

What would be the most runtime-efficient solution to this problem in R?

like image 342
datamole Avatar asked Aug 04 '15 12:08

datamole


People also ask

Can we use vector as key in HashMap?

For the hashmap, we can use the inserted value as the key and its vector index as the corresponding hashmap value.

Does HashMap store key value pairs?

HashMap stores the data in (Key, Value) pairs, and you can access them by an index of another type. HashMap class implements Map interface which allows us to store key.

How do you get a key value pair on a map?

To get the key and value elements, we should call the getKey() and getValue() methods. The Map.Entry interface contains the getKey() and getValue() methods. But, we should call the entrySet() method of Map interface to get the instance of Map.Entry.

What is MAP key value?

Key value maps (KVMs) are ideal for this. A KVM is a custom collection of encrypted key/value String pairs. The following lists three broad use cases for storing data in KVMs: User session data: Data that is created and deleted by the runtime only; you cannot view or manage KVM entries outside of the runtime.


1 Answers

There is a package called hashmap which is perfect for this:

library(hashmap)

hash_lookup = hashmap(mapping$key, mapping$value)

output = hash_lookup[[vector]]

Result:

> hash_lookup
## (character) => (character)
##       [cat] => [mammal]   
##  [elephant] => [mammal]   
##       [ant] => [insect]   
##       [dog] => [mammal]   
##     [eagle] => [bird]     
##    [parrot] => [bird]     

> output
[1] "mammal" "insect" "bird"   "insect" "bird"   "bird"

Data:

vector <- c("dog","ant","eagle","ant","eagle","parrot")

mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),
                      value=c("mammal","mammal","mammal","insect","bird","bird"),
                      stringsAsFactors = FALSE)

Note:

Have to test this on a bigger dataset, but this method should be very fast since it is implemented with Rcpp internally.

like image 123
acylam Avatar answered Nov 01 '22 16:11

acylam