Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert a factor column to multiple boolean columns

Tags:

r

data.table

Given data that looks like:

library(data.table)
DT <- data.table(x=rep(1:5, 2))

I would like to split this data into 5 boolean columns that indicate the presence of each number.

I can do this like this:

new.names <- sort(unique(DT$x))

DT[, paste0('col', new.names) := lapply(new.names, function(i) DT$x==i), with=FALSE]

But this uses a pesky lapply which is probably slower than the data.table alternative and this solutions strikes me as not very "data.table-ish".

Is there a better and/or faster way to create these new columns?

like image 446
Justin Avatar asked Jul 05 '12 18:07

Justin


2 Answers

How about model.matrix?

model.matrix(~factor(x)-1,data=DT)

   factor(x)1 factor(x)2 factor(x)3 factor(x)4 factor(x)5
1           1          0          0          0          0
2           0          1          0          0          0
3           0          0          1          0          0
4           0          0          0          1          0
5           0          0          0          0          1
6           1          0          0          0          0
7           0          1          0          0          0
8           0          0          1          0          0
9           0          0          0          1          0
10          0          0          0          0          1
attr(,"assign")
[1] 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$`factor(x)`
[1] "contr.treatment"

Apparently, you can put model.matrix into [.data.table to give the same results. Not sure if it would be faster:

DT[,model.matrix(~factor(x)-1)]
like image 174
BenBarnes Avatar answered Nov 18 '22 17:11

BenBarnes


There is also nnet::class.ind

library(nnet)

cbind(DT, setnames(as.data.table(DT[, class.ind(x)]),paste0('col', unique(DT$x))))
like image 34
mnel Avatar answered Nov 18 '22 16:11

mnel