Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create categorical variable from mutually exclusive dummy variables [duplicate]

How can I create a categorical variable from mutually exclusive dummy variables (taking values 0/1)?

Basically I am looking for the exact opposite of this solution: (https://subscription.packtpub.com/book/big_data_and_business_intelligence/9781787124479/1/01lvl1sec22/creating-dummies-for-categorical-variables).

Would appreciate a base R solution.

For example, I have the following data:

dummy.df <- structure(c(1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 
                        0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 
                        0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L), 
            .Dim = c(10L, 4L), 
            .Dimnames = list(NULL, c("State.NJ", "State.NY", "State.TX", "State.VA")))
          State.NJ State.NY State.TX State.VA
     [1,]        1        0        0        0
     [2,]        0        1        0        0
     [3,]        1        0        0        0
     [4,]        0        0        0        1
     [5,]        0        1        0        0
     [6,]        0        0        1        0
     [7,]        1        0        0        0
     [8,]        0        0        0        1
     [9,]        0        0        1        0
    [10,]        0        0        0        1

I would like to get the following results

   state
1     NJ
2     NY
3     NJ
4     VA
5     NY
6     TX
7     NJ
8     VA
9     TX
10    VA

cat.var <- structure(list(state = structure(c(1L, 2L, 1L, 4L, 2L, 3L, 1L, 
4L, 3L, 4L), .Label = c("NJ", "NY", "TX", "VA"), class = "factor")), 
                    class = "data.frame", row.names = c(NA, -10L))
like image 521
ECII Avatar asked Feb 28 '26 08:02

ECII


1 Answers

# toy data
df <- data.frame(a = c(1,0,0,0,0), b = c(0,1,0,1,0), c = c(0,0,1,0,1))

df$cat <- apply(df, 1, function(i) names(df)[which(i == 1)])

Result:

> df
  a b c cat
1 1 0 0   a
2 0 1 0   b
3 0 0 1   c
4 0 1 0   b
5 0 0 1   c

To generalize, you'll need to play with the df and names(df) part, but you get the drift. One option would be to make a function, e.g.,

catmaker <- function(data, varnames, catname) {

  data[,catname] <- apply(data[,varnames], 1, function(i) varnames[which(i == 1)])

  return(data)

}

newdf <- catmaker(data = df, varnames = c("a", "b", "c"), catname = "newcat")

One nice aspect of the functional approach is that it is robust to variations in the order of names in the vector of column names you feed into it. I.e., varnames = c("c", "a", "b") produces the same result as varnames = c("a", "b", "c").

P.S. You added some example data after I posted this. The function works on your example, as long as you convert dummy.df to a data frame first, e.g., catmaker(data = as.data.frame(dummy.df), varnames = colnames(dummy.df), "State") does the job.

like image 81
ulfelder Avatar answered Mar 02 '26 14:03

ulfelder



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!