thanks in advance for the help.
There are several questions using spread
(from long to wide) on duplicate rows with unite
such as this.
I think what makes my question unique is the need to output dummy variables.
I anticipate an input like so:
df <- data.frame(id = c(1,1,2,3,4), fruit = c("apple","pear","apple","orange","apple"))
And an output like so:
output <- data.frame(id=c(1,2,3,4), apple = c(1,1,0,1), pear = c(1,0,0,0), orange = c(0,0,1,0))
Any help would be greatly appreciated. Thanks.
This recoding is called “dummy coding” and leads to the creation of a table called contrast matrix. This is done automatically by statistical software, such as R. Here, you'll learn how to build and interpret a linear regression model with categorical predictor variables.
To convert category variables to dummy variables in tidyverse, use the spread() method. To do so, use the spread() function with three arguments: key, which is the column to convert into categorical values, in this case, “Reporting Airline”; value, which is the value you want to set the key to (in this case “dummy”);
Dummy variable in R programming is a type of variable that represents a characteristic of an experiment. A dummy variable is either 1 or 0 and 1 can be represented as either True or False and 0 can be represented as False or True depending upon the user.
Using tidyverse
you can add new column and than use spread
.
library(tidyverse)
df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)
# result
id apple orange pear
1 1 1 0 1
2 2 1 0 0
3 3 0 1 0
4 4 1 0 0
You can use dcast()
from the data.table
package.
data.table::dcast(df,
id ~ fruit,
fun.aggregate = function(x) 1L,
fill = 0L)
Which will return
id apple orange pear
1 1 1 0 1
2 2 1 0 0
3 3 0 1 0
4 4 1 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With