I have a data frame with multiple variables which in turn have multiple categories. I'll like to take each category and convert them to indicator variables.
V1 V2 V3 V4 xc ab ty ky xc ab ty kj xc yi tf kj cv yi tf kj bg yt tg kl bg yu yu kl
convert to
xc cv bg ..... T F F...... T F F.... T F F.... F T F.... F F T... F F T....
i tried
newframe <- transform(oldframe, xc = to_column(oldframe$V1,'xc'))
where to column is
to_column = function(col, val){
if (col == val)
'TRUE' else
'FALSE' }
This is one standard approach to creating dummy varaibles from a categorical variable:
model.matrix( ~ V1 - 1, data=df)
df is your data.frame as shown in your question. This returns 0/1 binary as your FALSE/TRUE. Hope that helps!
Best regards,
Jay
Building on @Jay's answer, we have this as a logical matrix.
Logical matrix version:
out <- model.matrix( ~ V1 - 1, data=dat)
out <- matrix(as.logical(out), ncol = ncol(out))
colnames(out) <- with(dat, levels(V1))
> out
bg cv xc
[1,] FALSE FALSE TRUE
[2,] FALSE FALSE TRUE
[3,] FALSE FALSE TRUE
[4,] FALSE TRUE FALSE
[5,] TRUE FALSE FALSE
[6,] TRUE FALSE FALSE
All variables at once version:
out2 <- sapply(dat, function(x) model.matrix( ~ x - 1))
out2 <- do.call(cbind, out2)
out2 <- matrix(as.logical(out2), ncol = ncol(out2))
colnames(out2) <- unlist(sapply(dat, levels))
> out2
bg cv xc ab yi yt yu tf tg ty
[1,] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
[2,] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE
[3,] FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
[4,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
[5,] TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[6,] TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
yu kj kl ky
[1,] FALSE FALSE FALSE TRUE
[2,] FALSE TRUE FALSE FALSE
[3,] FALSE TRUE FALSE FALSE
[4,] FALSE TRUE FALSE FALSE
[5,] FALSE FALSE TRUE FALSE
[6,] TRUE FALSE TRUE FALSE
If you don't want this as a full matrix like above, then you can stop with the first line, which has all the model matrices in a list, one for each variable (column) in dat
, and convert the to a logical. This one-liner does both steps:
> lapply(lapply(dat, function(x) model.matrix( ~ x - 1)),
+ function(x) matrix(as.logical(x), ncol = ncol(x)))
$V1
[,1] [,2] [,3]
[1,] FALSE FALSE TRUE
[2,] FALSE FALSE TRUE
[3,] FALSE FALSE TRUE
[4,] FALSE TRUE FALSE
[5,] TRUE FALSE FALSE
[6,] TRUE FALSE FALSE
$V2
[,1] [,2] [,3] [,4]
[1,] TRUE FALSE FALSE FALSE
[2,] TRUE FALSE FALSE FALSE
[3,] FALSE TRUE FALSE FALSE
[4,] FALSE TRUE FALSE FALSE
[5,] FALSE FALSE TRUE FALSE
[6,] FALSE FALSE FALSE TRUE
$V3
[,1] [,2] [,3] [,4]
[1,] FALSE FALSE TRUE FALSE
[2,] FALSE FALSE TRUE FALSE
[3,] TRUE FALSE FALSE FALSE
[4,] TRUE FALSE FALSE FALSE
[5,] FALSE TRUE FALSE FALSE
[6,] FALSE FALSE FALSE TRUE
$V4
[,1] [,2] [,3]
[1,] FALSE FALSE TRUE
[2,] TRUE FALSE FALSE
[3,] TRUE FALSE FALSE
[4,] TRUE FALSE FALSE
[5,] FALSE TRUE FALSE
[6,] FALSE TRUE FALSE
And if the variable names are important, then we can modify this to
foo <- function(x) {
mat <- matrix(as.logical(x), ncol = ncol(x))
colnames(mat) <- levels(x)
mat
}
lapply(lapply(dat, function(x) model.matrix( ~ x - 1)), foo)
You could have a look at the reshape package, it provides functionality to pivot data like this. There are examples of its use at the author's homepage
This is quite straightforward with mtabulate
from the "qdap" package:
library(qdap)
mtabulate(split(mydf, 1:nrow(mydf))) > 0
# ab bg cv kj kl ky tf tg ty xc yi
# 1 TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE
# 2 TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
# 3 FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE
# 4 FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
# 5 FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
# 6 FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# yt yu
# 1 FALSE FALSE
# 2 FALSE FALSE
# 3 FALSE FALSE
# 4 FALSE FALSE
# 5 TRUE FALSE
# 6 FALSE TRUE
By default, mtabulate
would tabulate the results (surprise!) so the result would be a numeric data.frame
. You'll see, for instance, that the count of "yu" in row 6 is actually 2. To get the logical
output you desire (just presence/absence), just compare the values obtained from mtabulate
with zero.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With