I want to create 7 dummy variables -one for each day, using dplyr
So far, I have managed to do it using the sjmisc
package and the to_dummy
function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df
#Sample dataframe
mydfdata.frame(x=rep(letters[1:9]),
day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))
#1.Create the 7 dummy variables separately
daysdummy<-sjmisc::to_dummy(mydf$day,suffix="label")
#2. append to dataframe
mydf<-bind_cols(mydf,daysdummy)
> mydf
x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
7 g Sun 0 0 0 1 0 0 0
8 h Fri 1 0 0 0 0 0 0
9 i Mon 0 1 0 0 0 0 0
My question is whether I can do it in one single workflow using dplyr
and add the to_dummy
into the pipe-workflow- perhaps using mutate
?
*to_dummy
documentation
mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name.
In R programming, the mutate function is used to create a new variable from a data set. In order to use the function, we need to install the dplyr package, which is an add-on to R that includes a host of cool functions for selecting, filtering, grouping, and arranging data.
What is the mutate() function in R? We can use the mutate() function in R programming to add new variables in the specified data frame. These new variables are added by performing the operations on present variables. Before using the mutate() function, you need to install the dplyr library.
If you want to do this with the pipe, you can do something like:
library(dplyr)
library(sjmisc)
mydf %>%
to_dummy(day, suffix = "label") %>%
bind_cols(mydf) %>%
select(x, day, everything())
Returns:
# A tibble: 9 x 9 x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 a Mon 0. 1. 0. 0. 0. 0. 0. 2 b Tues 0. 0. 0. 0. 0. 1. 0. 3 c Wed 0. 0. 0. 0. 0. 0. 1. 4 d Thurs 0. 0. 0. 0. 1. 0. 0. 5 e Fri 1. 0. 0. 0. 0. 0. 0. 6 f Sat 0. 0. 1. 0. 0. 0. 0. 7 g Sun 0. 0. 0. 1. 0. 0. 0. 8 h Fri 1. 0. 0. 0. 0. 0. 0. 9 i Mon 0. 1. 0. 0. 0. 0. 0.
With dplyr
and tidyr
we could do:
library(dplyr)
library(tidyr)
mydf %>%
mutate(var = 1) %>%
spread(day, var, fill = 0, sep = "_") %>%
left_join(mydf) %>%
select(x, day, everything())
And with base R we could do something like:
as.data.frame.matrix(table(rep(mydf$x, lengths(mydf$day)), unlist(mydf$day)))
Returns:
Fri Mon Sat Sun Thurs Tues Wed a 0 1 0 0 0 0 0 b 0 0 0 0 0 1 0 c 0 0 0 0 0 0 1 d 0 0 0 0 1 0 0 e 1 0 0 0 0 0 0 f 0 0 1 0 0 0 0 g 0 0 0 1 0 0 0 h 1 0 0 0 0 0 0 i 0 1 0 0 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With