Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a new variable from a lookup table

Tags:

dataframe

r

I have the following columns in my data set:

presult     aresult
  I         single
  I         double
  I         triple
  I         home run
  SS        strikeout

I would like to add a third column "bases" that is dependent upon the value of the result in column aresult.

For example, I would like bases to be 1 for a single, 2 for a double, 3 for a triple, 4 for a home run, and 0 for a strikeout.

Usually I would create the new variable like this:

dataset$base<-ifelse(dataset$aresult=="single", 1, 0)

The problem is that I don't know how to code the new variable in without setting all other variables to zero.

like image 778
Burton Guster Avatar asked Dec 08 '11 15:12

Burton Guster


3 Answers

Here is how to use a named vector for the lookup:

Define test data:

dat <- data.frame(
    presult = c(rep("I", 4), "SS", "ZZ"),
    aresult = c("single", "double", "triple", "home run", "strikeout", "home run"),
    stringsAsFactors=FALSE
)

Define a named numeric vector with the scores:

score <- c(single=1, double=2, triple=3, `home run`=4,  strikeout=0)

Use vector indexing to match the scores against results:

dat$base <- score[dat$aresult]
dat
  presult   aresult base
1       I    single    1
2       I    double    2
3       I    triple    3
4       I  home run    4
5      SS strikeout    0
6      ZZ  home run    4

Additional information:

If you don't wish to construct the named vector by hand, say in the case where you have large amounts of data, then do it as follows:

scores <- c(1:4, 5)
names(scores) <- c("single", "double", "triple", "home run", "strikeout")

(Or read the values and names from existing data. The point is to construct a numeric vector and then assign names.)

like image 121
Andrie Avatar answered Oct 04 '22 05:10

Andrie


define your lookup table

lookup= data.frame( 
        base=c(0,1,2,3,4), 
        aresult=c("strikeout","single","double","triple","home run"))

then use join from plyr

dataset = join(dataset,lookup,by='aresult')
like image 27
LouisChiffre Avatar answered Oct 04 '22 07:10

LouisChiffre


An alternative to Dieter's answer:

dat <- data.frame(
  presult = c(rep("I", 4), "SS", "ZZ"),
  aresult = c("single", "double", "triple", "home run", "strikeout", "home run"),
  stringsAsFactors=FALSE
)

dat$base <- as.integer(factor(dat$aresult,
  levels=c("strikeout","single","double","triple","home run")))-1
like image 21
Joshua Ulrich Avatar answered Oct 04 '22 07:10

Joshua Ulrich