Updated: With apologies to those who replied, in my original example I overlooked the fact that data.frame()
created var
as a factor rather than as a character vector, as I had intended. I have corrected the example, and this will break at least one of the answers.
--original--
I have a data frame that I'm performing a series of dplyr and tidyr manipulations on, and I would like to add columns for indicator variables that would be encoded as 0 or 1, and do this within the dplyr chain. Each level of a factor (presently stored as character vectors) should be encoded in a separate column, and the column names are a concatenation of a fixed prefix with the variable level, e.g. var
has level a, new column var_a
will be 1, and all other rows of var_a
will be 0.
The following minimal example using base R produces exactly the results that I want (thanks to this blog post), but I'd like to roll it all into the dplyr chain, and can't quite figure out how to do it.
library(dplyr)
df <- data.frame(var = sample(x = letters[1:4], size = 10, replace = TRUE), stringsAsFactors = FALSE)
for(level in unique(df$var)){
df[paste("var", level, sep = "_")] <- ifelse(df$var == level, 1, 0)
}
Note that the real data set contains multiple columns, none of which should be altered or dropped when creating the indicator variables, with the exception of the column var
, which could be converted to type factor.
It's not pretty, but this function should work
dummy <- function(data, col) {
for(c in col) {
idx <- which(names(data)==c)
v <- data[[idx]]
stopifnot(class(v)=="factor")
m <- matrix(0, nrow=nrow(data), ncol=nlevels(v))
m[cbind(seq_along(v), as.integer(v))]<-1
colnames(m) <- paste(c, levels(v), sep="_")
r <- data.frame(m)
if ( idx>1 ) {
r <- cbind(data[1:(idx-1)],r)
}
if ( idx<ncol(data) ) {
r <- cbind(r, data[(idx+1):ncol(data)])
}
data <- r
}
data
}
Here's a sample data.frame
dd <- data.frame(a=runif(30),
b=sample(letters[1:3],30,replace=T),
c=rnorm(30),
d=sample(letters[10:13],30,replace=T)
)
and you specify the columns you want to expand as a character vector. You can do
dd %>% dummy("b")
or
dd %>% dummy(c("b","d"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With