Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create 'dummy variables' by spreading duplicate rows into columns in R

Tags:

r

dplyr

thanks in advance for the help.

There are several questions using spread (from long to wide) on duplicate rows with unite such as this.

I think what makes my question unique is the need to output dummy variables.

I anticipate an input like so:

df <- data.frame(id = c(1,1,2,3,4), fruit = c("apple","pear","apple","orange","apple"))

And an output like so:

output <- data.frame(id=c(1,2,3,4), apple = c(1,1,0,1), pear = c(1,0,0,0), orange = c(0,0,1,0))

Any help would be greatly appreciated. Thanks.

like image 206
ReginaldMilton Avatar asked Jan 14 '18 18:01

ReginaldMilton


People also ask

Does R automatically create dummy variables regression?

This recoding is called “dummy coding” and leads to the creation of a table called contrast matrix. This is done automatically by statistical software, such as R. Here, you'll learn how to build and interpret a linear regression model with categorical predictor variables.

How do I convert dummy variables in R?

To convert category variables to dummy variables in tidyverse, use the spread() method. To do so, use the spread() function with three arguments: key, which is the column to convert into categorical values, in this case, “Reporting Airline”; value, which is the value you want to set the key to (in this case “dummy”);

What is a dummy variable in R studio?

Dummy variable in R programming is a type of variable that represents a characteristic of an experiment. A dummy variable is either 1 or 0 and 1 can be represented as either True or False and 0 can be represented as False or True depending upon the user.


2 Answers

Using tidyverse you can add new column and than use spread.

library(tidyverse)

df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)

# result
  id apple orange pear
1  1     1      0    1
2  2     1      0    0
3  3     0      1    0
4  4     1      0    0
like image 111
m0nhawk Avatar answered Oct 04 '22 23:10

m0nhawk


You can use dcast() from the data.table package.

data.table::dcast(df, 
                  id ~ fruit, 
                  fun.aggregate = function(x) 1L,
                  fill = 0L)

Which will return

  id apple orange pear
1  1     1      0    1
2  2     1      0    0
3  3     0      1    0
4  4     1      0    0
like image 30
clemens Avatar answered Oct 04 '22 23:10

clemens