I have an R data frame containing a factor that I want to "expand" so that for each factor level, there is an associated column in a new data frame, which contains a 1/0 indicator. E.g., suppose I have: <pre class="prettyprint"><code>df.original <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c(1,2,3,4)) </code></pre> I want: <pre class="prettyprint"><code>df.desired <- data.frame(foo = c(1,1,0,0), bar=c(0,0,1,1), ham=c(1,2,3,4)) </code></pre> Because for certain analyses for which you need to have a completely numeric data frame (e.g., principal component analysis), I thought this feature might be built in. Writing a function to do this shouldn't be too hard, but I can foresee some challenges relating to column names and if something exists already, I'd rather use that.

Use the <code>model.matrix</code> function: <pre class="prettyprint"><code>model.matrix( ~ Species - 1, data=iris ) </code></pre>

If your data frame is only made of factors (or you are working on a subset of variables which are all factors), you can also use the <code>acm.disjonctif</code> function from the <code>ade4</code> package : <pre class="prettyprint"><code>R> library(ade4) R> df <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c("red","blue","green","red")) R> acm.disjonctif(df) eggs.bar eggs.foo ham.blue ham.green ham.red 1 0 1 0 0 1 2 0 1 1 0 0 3 1 0 0 1 0 4 1 0 0 0 1 </code></pre> Not exactly the case you are describing, but it can be useful too...

Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

Tags:

r

I have an R data frame containing a factor that I want to "expand" so that for each factor level, there is an associated column in a new data frame, which contains a 1/0 indicator. E.g., suppose I have:

df.original <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c(1,2,3,4))

I want:

df.desired  <- data.frame(foo = c(1,1,0,0), bar=c(0,0,1,1), ham=c(1,2,3,4))

Because for certain analyses for which you need to have a completely numeric data frame (e.g., principal component analysis), I thought this feature might be built in. Writing a function to do this shouldn't be too hard, but I can foresee some challenges relating to column names and if something exists already, I'd rather use that.

870

asked Feb 19 '11 03:02

John Horton

2 Answers

Use the model.matrix function:

model.matrix( ~ Species - 1, data=iris )

158

answered Sep 22 '22 15:09

Greg Snow

If your data frame is only made of factors (or you are working on a subset of variables which are all factors), you can also use the acm.disjonctif function from the ade4 package :

R> library(ade4) R> df <-data.frame(eggs = c("foo", "foo", "bar", "bar"), ham = c("red","blue","green","red")) R> acm.disjonctif(df)   eggs.bar eggs.foo ham.blue ham.green ham.red 1        0        1        0         0       1 2        0        1        1         0       0 3        1        0        0         1       0 4        1        0        0         0       1

Not exactly the case you are describing, but it can be useful too...

answered Sep 21 '22 15:09

juba

Related questions
                            
                                How to remove outliers from a dataset
                            
                                dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output
                            
                                Relationship between R Markdown, Knitr, Pandoc, and Bookdown
                            
                                How to put labels over geom_bar for each bar in R with ggplot2
                            
                                How to change the default font size in ggplot2
                            
                                How can I manipulate the strip text of facet_grid plots?
                            
                                R for loop skip to next iteration ifelse
                            
                                R: Comment out block of code [duplicate]
                            
                                How to parse XML to R data frame
                            
                                How to change 'Maximum upload size exceeded' restriction in Shiny and save user file inputs?
                            
                                How to not run an example using roxygen2?
                            
                                R dplyr: Drop multiple columns
                            
                                How to round up to the nearest 10 (or 100 or X)?
                            
                                Pass column name in data.table using variable [duplicate]
                            
                                How do I change the background color of a plot made with ggplot2
                            
                                filter for complete cases in data.frame using dplyr (case-wise deletion)
                            
                                Rotating x axis labels in R for barplot
                            
                                knitr Markdown highlighting in Emacs?
                            
                                What's the difference between integer class and numeric class in R
                            
                                How to get row from R data.frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With