Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split character column into several binary (0/1) columns

Tags:

split

r

I have a character vector like this:

a <- c("a,b,c", "a,b", "a,b,c,d")

What I would like to do is create a data frame where the individual letters in each string are represented by dummy columns:

   a    b    c    d
1] 1    1    1    0
2] 1    1    0    0
3] 1    1    1    1

I have a feeling that I need to be using some combination of read.table and reshape but am really struggling. Any and help appreciated.

like image 513
gh0strider18 Avatar asked May 01 '15 13:05

gh0strider18


2 Answers

You can try cSplit_e from my "splitstackshape" package:

library(splitstackshape)
a <- c("a,b,c", "a,b", "a,b,c,d")
cSplit_e(as.data.table(a), "a", ",", type = "character", fill = 0)
#          a a_a a_b a_c a_d
# 1:   a,b,c   1   1   1   0
# 2:     a,b   1   1   0   0
# 3: a,b,c,d   1   1   1   1
cSplit_e(as.data.table(a), "a", ",", type = "character", fill = 0, drop = TRUE)
#    a_a a_b a_c a_d
# 1:   1   1   1   0
# 2:   1   1   0   0
# 3:   1   1   1   1

There's also mtabulate from "qdapTools":

library(qdapTools)
mtabulate(strsplit(a, ","))
#   a b c d
# 1 1 1 1 0
# 2 1 1 0 0
# 3 1 1 1 1

A very direct base R approach is to use table along with stack and strsplit:

table(rev(stack(setNames(strsplit(a, ",", TRUE), seq_along(a)))))
#    values
# ind a b c d
#   1 1 1 1 0
#   2 1 1 0 0
#   3 1 1 1 1
like image 120
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 21 '22 07:10

A5C1D2H2I1M1N2O1R2T1


Another convoluted base-R solution:

x  <- strsplit(a,",")
xl <- unique(unlist(x))

t(sapply(x,function(z)table(factor(z,levels=xl))))

which gives

     a b c d
[1,] 1 1 1 0
[2,] 1 1 0 0
[3,] 1 1 1 1
like image 44
Frank Avatar answered Oct 21 '22 09:10

Frank