Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transparent lookup table for numeric values without using data.frame?

Tags:

r

Advanced R dicusses the idea of using character subsetting for lookup tables.

x <- c("m", "f", "u", "f", "f", "m", "m")
lookup <- c(m = "Male", f = "Female", u = NA)
lookup[x]
#>        m        f        u        f        f        m        m 
#>   "Male" "Female"       NA "Female" "Female"   "Male"   "Male"

Created on 2019-03-04 by the reprex package (v0.2.1)

However, this idea does not work for numeric lookups, because names is a special attribute that is required to be a character vector.

What is the simple equivalent solution for numeric lookups, that does not require a data.frame?

I want to avoid a data.frame solution, because the mapping between keys and values is only based on order as opposed to the more transparent 3 = 'Excellent', 2 = 'Good', 1 = 'Poor'.


A solution using data.frame is suggested by the paragraph following character lookup tables.

grades <- c(1, 2, 2, 3, 1)

info <- data.frame(
  grade = 3:1,
  desc = c("Excellent", "Good", "Poor"),
  fail = c(F, F, T)
)

info[grades, 'desc']
#> [1] Excellent Good      Good      Poor      Excellent
#> Levels: Excellent Good Poor

Created on 2019-03-04 by the reprex package (v0.2.1)

like image 791
robust Avatar asked Mar 04 '19 20:03

robust


2 Answers

If your keys will only be positive integers, you can use the index value as suggested by Soren in their answer to this question: https://stackoverflow.com/a/54990917


If not, you can still use the names based strategy you described above by storing your numbers in names(lookup) as character and then using as.character to convert a vector of numeric keys into the right form for matching:

y <- c(1, -2, 1.3, -5)
lookup_num <- c('1' = 'Cat', '-2' = 'Dog', '1.3' = 'Fish', '-5' = 'Hedgehog')
lookup_num[as.character(y)]
         1         -2        1.3         -5 
     "Cat"      "Dog"     "Fish" "Hedgehog" 

One possible downside of this approach is that, since the numbers will be dealt with as strings, it won't properly match 0.0 with 0, or 3.00 with 3, so you'd need to make sure your numeric values are clean.


If performance is not a huge concern, you can reverse the order of key and value, putting your numeric key as the value and the character lookup value as the name, and then use sapply to look up each key:

lookup_num <- c('Cat' = 1, 'Dog' = -2, 'Fish' = 1.3, 'Hedgehog' = -5)
keys <- c(-2, 1.3, -2, 1)
sapply(keys, function(x) which(lookup_num == x))
 Dog Fish  Dog  Cat 
   2    3    2    1 

This has the advantage of using numeric matching which resists problems caused by variable numeric formatting, and gives you a lot of flexibility on how you match (for example, you could do: abs(lookup_num - x) < 0.1 to add wiggle room in your numeric matching)

The downside is that is has a pretty bad time complexity, but if your list of keys and/or lookup table are not huge, you won't notice at all.

like image 50
divibisan Avatar answered Nov 09 '22 01:11

divibisan


You could consider using a lookup function instead. For example, here's a simple helper function that creates a lookup function for you:

create.lookup = function(name, value) {
  function(lookup.name) value[match(lookup.name, name)]
}

An example of using this:

grades <- c(1, 2, 2, 3, 1)
lookup = create.lookup(c(3, 2, 1), c("Excellent", "Good", "Poor"))
lookup(grades)
# [1] "Poor"      "Good"      "Good"      "Excellent" "Poor"     

Also works with negative and non-integer values

grades <- c(2, 1.1, 2, -3, 1.1)
lookup = create.lookup(c(1.1, 2, -3), c("Excellent", "Good", "Poor"))
lookup(grades)
# [1] "Good"      "Excellent" "Good"      "Poor"      "Excellent"

And it still works even if the numbers are written differently

grades <- c(2.000, 1.10, 2, -3e0, 001.1)
lookup(grades)
# [1] "Good"      "Excellent" "Good"      "Poor"      "Excellent"

As an added bonus, the same method also works for character-type lookups, thus providing a single method for the various use cases

grades <- c('p', 'g', 'g', 'e', 'p')
lookup = create.lookup(c('e', 'g', 'p'), c("Excellent", "Good", "Poor"))
lookup(grades)
# [1] "Poor"      "Good"      "Good"      "Excellent" "Poor"     
like image 1
dww Avatar answered Nov 09 '22 02:11

dww