Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I replace numeric codes with value labels from a lookup table?

This question is related to this question, but not quite the same.

Say I have this data frame,

df <- data.frame(
                id = c(1:6),
                profession = c(1, 5, 4, NA, 0, 5))

and a string with human readable information about the profession codes. Say,

profession.code <- c(
                     Optometrists=1, Accountants=2, Veterinarians=3, 
                     `Financial analysts`=4,  Nurses=5)

Now, I'm looking for the easiest way to replace the values in df$profession with the text found in profession.code. Preferably without use of special libraries, unless it shortens the code significantly.

I would like my end result to be

df <- data.frame(
                id = c(1:6),
                profession = c("Optometrists", "Nurses", 
                "Financial analysts", NA, 0, "Nurses"))

Any help would be greatly appreciated.

Thanks, Eric

like image 640
Eric Fail Avatar asked Apr 03 '12 22:04

Eric Fail


3 Answers

You can do it this way:

df <- data.frame(id = c(1:6),
                 profession = c(1, 5, 4, NA, 0, 5))

profession.code <- c(`0` = 0, Optometrists=1, Accountants=2, Veterinarians=3, 
                     `Financial analysts`=4,  Nurses=5)

df$profession.str <- names(profession.code)[match(df$profession, profession.code)]
df
#   id profession     profession.str
# 1  1          1       Optometrists
# 2  2          5             Nurses
# 3  3          4 Financial analysts
# 4  4         NA               <NA>
# 5  5          0                  0
# 6  6          5             Nurses

Note that I had to add a 0 entry in your profession.code vector to account for those zeroes.

EDIT: here is an updated solution to account for Eric's comment below that the data may contain any number of profession codes for which there are no corresponding descriptions:

match.idx <- match(df$profession, profession.code)
df$profession.str <- ifelse(is.na(match.idx),
                            df$profession,
                            names(profession.code)[match.idx])
like image 59
flodel Avatar answered Nov 08 '22 13:11

flodel


I played around with it and this is my current solution using the car package.

pLoop <- function(v) paste(profession.code[v],"='", names(profession.code[v]),"';") 
library(car)
df$profession<- recode(df$profession, paste(sapply(1:5, pLoop),collapse=""))

df
# id           profession
#  1         Optometrists 
#  2               Nurses 
#  3   Financial analysts 
#  4                 <NA>
#  5                    0
#  6               Nurses 

Still interest to if anyone have other suggestions for a solution. I would prefer to do it using only the base function in R.

like image 20
Eric Fail Avatar answered Nov 08 '22 14:11

Eric Fail


I personally like the way the arules package deals with this problem, using the decode function. From the documentation:

library(arules)
data("Adult")

## Example 1: Manual decoding
## get code
iLabels <- itemLabels(Adult)
head(iLabels)

## get undecoded list and decode in a second step
list <- LIST(Adult[1:5], decode = FALSE)
list

decode(list, itemLabels = iLabels)

Advantage is that the package also offers the functions encode and recode. Their respective purpose is straightforward, I believe.

like image 1
ATN Avatar answered Nov 08 '22 13:11

ATN