Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a factor from a binary indicator matrix?

Tags:

r

Say I have the following matrix mat, which is a binary indicator matrix for the levels A, B, and C for a set of 5 observations:

mat <- matrix(c(1,0,0,
                1,0,0,
                0,1,0,
                0,1,0,
                0,0,1), ncol = 3, byrow = TRUE)
colnames(mat) <- LETTERS[1:3]

> mat
     A B C
[1,] 1 0 0
[2,] 1 0 0
[3,] 0 1 0
[4,] 0 1 0
[5,] 0 0 1

I want to convert that into a single factor such that the output is equivalent to fac defines as:

> fac <- factor(rep(LETTERS[1:3], times = c(2,2,1)))
> fac
[1] A A B B C
Levels: A B C

Extra points if you get the labels from the colnames of mat, but a set of numeric codes (e.g. c(1,1,2,2,3)) would also be acceptable as desired output.

like image 779
Gavin Simpson Avatar asked Oct 11 '11 14:10

Gavin Simpson


4 Answers

Elegant solution with matrix multiplication (and shortest up to now):

as.factor(colnames(mat)[mat %*% 1:ncol(mat)])
like image 110
Tomas Avatar answered Nov 11 '22 19:11

Tomas


This solution makes use of the arr.ind=TRUE argument of which, returning the matching positions as array locations. These are then used to index the colnames:

> factor(colnames(mat)[which(mat==1, arr.ind=TRUE)[, 2]])
[1] A A B B C
Levels: A B C

Decomposing into steps:

> which(mat==1, arr.ind=TRUE)
     row col
[1,]   1   1
[2,]   2   1
[3,]   3   2
[4,]   4   2
[5,]   5   3

Use the values of the second column, i.e. which(...)[, 2] and index colnames:

> colnames(mat)[c(1, 1, 2, 2, 3)]
[1] "A" "A" "B" "B" "C"

And then convert to a factor

like image 38
Andrie Avatar answered Nov 11 '22 20:11

Andrie


One way is to replicate the names out by row number and index directly with the matrix, then wrap that with factor to restore the levels:

factor(rep(colnames(mat), each = nrow(mat))[as.logical(mat)])
[1] A A B B C
Levels: A B C

If this is from model.matrix, the colnames have fac prepended, and so this should work the same but removing the extra text:

factor(gsub("^fac", "", rep(colnames(mat), each = nrow(mat))[as.logical(mat)]))
like image 5
mdsumner Avatar answered Nov 11 '22 20:11

mdsumner


You could use something like this:

lvls<-apply(mat, 1, function(currow){match(1, currow)})
fac<-factor(lvls, 1:3, labels=colnames(mat))
like image 4
Nick Sabbe Avatar answered Nov 11 '22 18:11

Nick Sabbe