Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loop to create dummy variable R

Tags:

loops

r

I am trying to generate dummy variables (must be 1/0) using a loop based on the most frequent response of a variable. After lots of googling, I haven't managed to come up with a solution. I have extracted the most frequent responses (strings, say the top 5 are "A","B",...,"E") using

top5<-names(head(sort(table(data$var1), decreasing = TRUE),5)

I would like the loop to check if another variable ("var2") equals A, if so set =1, OW =0, then give a summary using aggregate(). In Stata, I can refer to the looped variable i using `i' but not in R... The code that does not work is:

for(i in top5) {
   data$i.dummy <- ifelse(data$var2=="i",1,0)
   aggregate(data$i.dummy~data$age+data$year,data,mean)
}

Any suggestions?

like image 203
kirk Avatar asked Dec 21 '22 04:12

kirk


2 Answers

If you want one column per item in your top 5 then I would use sapply along the elements in top5. No need for ifelse because == compares and gives TRUE or 1 if the comparison is TRUE and 0 otherwise

Here we cbind a matrix of 5 columns, one each for each element of top5 containing 1 if the row in data$var2 equals the respective element of 'top5':

data <- cbind( data , sapply( top5 , function(x) as.integer( data$var2 == x ) ) )

If you want one column for matches of any of top5 it's even easier:

data$dummies <- as.integer( data$var2 %in% top5 )

The as.integer() in both cases is used to turn TRUE or FALSE to 1 and 0 respectively.

A cut down example to illustrate how it works:

set.seed(123)
top2 <- c("A","B")
data <- data.frame( var2 = sample(LETTERS[1:4],6,repl=TRUE) )

#  Make dummy variables, one column for each element in topX vector
data <- cbind( data , sapply( top2 , function(x) as.integer( data$var2 == x ) ) )
data
#  var2 A B
#1    B 0 1
#2    D 0 0
#3    B 0 1
#4    D 0 0
#5    D 0 0
#6    A 1 0

#  Make single column for all elements in topX vector
data$ANY <- as.integer( data$var2 %in% top2 )
data
#  var2 ANY A B
#1    B   1 0 1
#2    D   0 0 0
#3    B   1 0 1
#4    D   0 0 0
#5    D   0 0 0
#6    A   1 1 0
like image 58
Simon O'Hanlon Avatar answered Dec 22 '22 17:12

Simon O'Hanlon


See fortune(312), then read the help ?"[[" and possibly the help for paste0.

Then possibly consider using other tools like model.matrix and sapply rather than doing everything yourself using loops.

like image 36
Greg Snow Avatar answered Dec 22 '22 18:12

Greg Snow