Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

Tags:

r

I have an R data frame containing a factor that I want to "expand" so that for each factor level, there is an associated column in a new data frame, which contains a 1/0 indicator. E.g., suppose I have:

df.original <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c(1,2,3,4)) 

I want:

df.desired  <- data.frame(foo = c(1,1,0,0), bar=c(0,0,1,1), ham=c(1,2,3,4)) 

Because for certain analyses for which you need to have a completely numeric data frame (e.g., principal component analysis), I thought this feature might be built in. Writing a function to do this shouldn't be too hard, but I can foresee some challenges relating to column names and if something exists already, I'd rather use that.

like image 870
John Horton Avatar asked Feb 19 '11 03:02

John Horton


People also ask

How do you factor all variables in R?

In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables.

How do you set factor levels in R?

One way to change the level order is to use factor() on the factor and specify the order directly. In this example, the function ordered() could be used instead of factor() . Another way to change the order is to use relevel() to make a particular level first in the list. (This will not work for ordered factors.)

How is factor () used in R?

Factors in R are stored as a vector of integer values with a corresponding set of character values to use when the factor is displayed. The factor function is used to create a factor. The only required argument to factor is a vector of values which will be returned as a vector of factor values.

Why do we convert variables to factors in R?

In R, factors are used to work with categorical variables, variables that have a fixed and known set of possible values. They are also useful when you want to display character vectors in a non-alphabetical order. Historically, factors were much easier to work with than characters.


2 Answers

Use the model.matrix function:

model.matrix( ~ Species - 1, data=iris ) 
like image 158
Greg Snow Avatar answered Sep 22 '22 15:09

Greg Snow


If your data frame is only made of factors (or you are working on a subset of variables which are all factors), you can also use the acm.disjonctif function from the ade4 package :

R> library(ade4) R> df <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c("red","blue","green","red")) R> acm.disjonctif(df)   eggs.bar eggs.foo ham.blue ham.green ham.red 1        0        1        0         0       1 2        0        1        1         0       0 3        1        0        0         1       0 4        1        0        0         0       1 

Not exactly the case you are describing, but it can be useful too...

like image 29
juba Avatar answered Sep 21 '22 15:09

juba