Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating categorical variables from mutually exclusive dummy variables

My question regards an elaboration on a previously answered question about combining multiple dummy variables into a single categorical variable.

In the question previously asked, the categorical variable was created from dummy variables that were NOT mutually exclusive. For my case, my dummy variables are mutually exclusive because they represent crossed experimental conditions in a 2X2 between-subjects factorial design (that also has a within subjects component which I'm not addressing here), so I don't think interaction does what I need to do.

For example, my data might look like this:

id   conditionA    conditionB    conditionC     conditionD
1    NA            1             NA             NA
2    1             NA            NA             NA
3    NA            NA            1              NA
4    NA            NA            NA             1
5    NA            2             NA             NA
6    2             NA            NA             NA
7    NA            NA            2              NA
8    NA            NA            NA             2

I'd like to now make categorical variables that combine ACROSS different types of conditions. For example, people who had values for condition A and B might be coded with one categorical variable, and people who had values for condition C and D.

id   conditionA    conditionB    conditionC     conditionD  factor1    factor2
1    NA            1             NA             NA          1          NA
2    1             NA            NA             NA          1          NA
3    NA            NA            1              NA          NA         1
4    NA            NA            NA             1           NA         1
5    NA            2             NA             NA          2          NA
6    2             NA            NA             NA          2          NA
7    NA            NA            2              NA          NA         2
8    NA            NA            NA             2           NA         2

Right now, I'm doing this using ifelse() statements, which quite simply is a hot mess (and doesn't always work). Please help! There's probably some super-obvious "easier way."

EDIT:

The kinds of ifelse commands that I am using are as follows:

attach(df)
df$factor<-ifelse(conditionA==1 | conditionB==1, 1, NA)
df$factor<-ifelse(conditionA==2 | conditionB==2, 2, df$factor)

In reality, I'm combining across 6-8 columns each time, so a more elegant solution would help a lot.

like image 963
roody Avatar asked Apr 21 '13 19:04

roody


2 Answers

Update (2019): Please use dplyr::coalesce(), it works pretty much the same.

My R package has a convenience function that allows to choose the first non-NA value for each element in a list of vectors:

#library(devtools)
#install_github('kimisc', 'muelleki')
library(kimisc)

df$factor1 <- with(df, coalesce.na(conditionA, conditionB))

(I'm not sure if this works if conditionA and conditionB are factors. Convert them to numerics before using as.numeric(as.character(...)) if necessary.)

Otherwise, you could give interaction a try, combined with recoding of the levels of the resulting factor -- but to me it looks like you're more interested in the first solution:

df$conditionAB <- with(df, interaction(coalesce.na(conditionA, 0), 
                                       coalesce.na(conditionB, 0)))
levels(df$conditionAB) <- c('A', 'B')
like image 191
krlmlr Avatar answered Oct 30 '22 10:10

krlmlr


I think this function gives you what you need (admittedly, this is a quick hack).

to_indicator <- function(x, grp)
{
    apply(tbl, 1,
          function (x)
          {
              idx <- which(!is.na(x))
              nm <- names(idx)
              if (nm %in% grp)
                x[idx]
              else
                NA
          })
}

And here is it's used with the example data you provide.

tbl <- read.table(header=TRUE, text="
conditionA    conditionB    conditionC     conditionD
NA            1             NA             NA
1             NA            NA             NA
NA            NA            1              NA
NA            NA            NA             1
NA            2             NA             NA
2             NA            NA             NA
NA            NA            2              NA
NA            NA            NA             2")
tbl <- data.frame(tbl)

(tbl <- cbind(tbl,
              factor1=to_indicator(tbl, c("conditionA", "conditionB")),
              factor2=to_indicator(tbl, c("conditionC", "conditionD"))))
like image 44
Jason Morgan Avatar answered Oct 30 '22 10:10

Jason Morgan