thanks in advance for the help. There are several questions using <code>spread</code> (from long to wide) on duplicate rows with <code>unite</code> such as this. I think what makes my question unique is the need to output dummy variables. I anticipate an input like so: <pre class="prettyprint"><code>df <- data.frame(id = c(1,1,2,3,4), fruit = c("apple","pear","apple","orange","apple")) </code></pre> And an output like so: <pre class="prettyprint"><code>output <- data.frame(id=c(1,2,3,4), apple = c(1,1,0,1), pear = c(1,0,0,0), orange = c(0,0,1,0)) </code></pre> Any help would be greatly appreciated. Thanks.

Using <code>tidyverse</code> you can add new column and than use <code>spread</code>. <pre class="prettyprint"><code>library(tidyverse) df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0) # result id apple orange pear 1 1 1 0 1 2 2 1 0 0 3 3 0 1 0 4 4 1 0 0 </code></pre>

Create 'dummy variables' by spreading duplicate rows into columns in R

Tags:

r

dplyr

thanks in advance for the help.

There are several questions using spread (from long to wide) on duplicate rows with unite such as this.

I think what makes my question unique is the need to output dummy variables.

I anticipate an input like so:

df <- data.frame(id = c(1,1,2,3,4), fruit = c("apple","pear","apple","orange","apple"))

And an output like so:

output <- data.frame(id=c(1,2,3,4), apple = c(1,1,0,1), pear = c(1,0,0,0), orange = c(0,0,1,0))

Any help would be greatly appreciated. Thanks.

206

asked Jan 14 '18 18:01

ReginaldMilton

2 Answers

Using tidyverse you can add new column and than use spread.

library(tidyverse)

df %>% mutate(i = 1) %>% spread(fruit, i, fill = 0)

# result
  id apple orange pear
1  1     1      0    1
2  2     1      0    0
3  3     0      1    0
4  4     1      0    0

111

answered Oct 04 '22 23:10

m0nhawk

You can use dcast() from the data.table package.

data.table::dcast(df, 
                  id ~ fruit, 
                  fun.aggregate = function(x) 1L,
                  fill = 0L)

Which will return

  id apple orange pear
1  1     1      0    1
2  2     1      0    0
3  3     0      1    0
4  4     1      0    0

answered Oct 04 '22 23:10

clemens

Related questions
                            
                                How to convert R dataframe to Json in name/value pair?
                            
                                Controlling linetype, color and label in ggplot legend
                            
                                How to filter data.frame by a factor that includes NA as level
                            
                                How to compare lists in a dataframe
                            
                                Display a rectangle in ggplot with x axis in date format
                            
                                Rowwise cumulative sum
                            
                                Preserve order of columns when going from wide to long format
                            
                                Floor and ceiling with 2 or more significant digits
                            
                                aggregate/merge over date range using data.table
                            
                                R radarchart: free axis to enhance records display?
                            
                                rbind a list of data frames with different columns [duplicate]
                            
                                Wildcards for filter function in dplyr
                            
                                How to refer to variable instead of column with dplyr
                            
                                Logarithmic scale plot in R
                            
                                Add visitor count and analytics to R blogdown > netlify housted website
                            
                                grepl across multiple, specified columns
                            
                                Fill in sequential values in a dataframe
                            
                                Condition in ifelse: Value in multiple columns/variables
                            
                                Change the color of a ggplot geom a posteriori (after having specified another color)
                            
                                Extracting Information from Multi-Level Nested Lists

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With