Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract column names & combine them with delimiter if value =1 (binary values) and put it in new column

I have a dataframe with binary values post performing get_dummies using pandas

df= 
Values  A1  A2  B1  B2  B3  B4  C1  C2  C3
10      1   0   1   0   0   0   1   0   0
12      0   1   0   0   1   0   0   1   0
3       0   1   0   1   0   0   0   0   1
5       1   0   0   0   0   1   1   0   0

I want a new column which has combination of all columns which has 1's in it

Expected output:

Values  A1  A2  B1  B2  B3  B4  C1  C2  C3  Combination
10      1   0   1   0   0   0   1   0   0   A1~~B1~~C1
12      0   1   0   0   1   0   0   1   0   A2~~B3~~C2
3       0   1   0   1   0   0   0   0   1   A2~~B2~~C3
5       1   0   0   0   0   1   1   0   0   A1~~B4~~C3

Actual matrix can be 25000+ rows*1000+columns

There is a similar solution found in R but i need it in Python bcoz all other dependencies are in python and R is new to me.

Extract column names with value 1 in binary matrix

Codes in R below, & need similar one or any other code in python which can help me to arrive at my expected output
Solution 1 : 
as.matrix(apply(m==1,1,function(a) paste0(colnames(m)[a], collapse = "")))

Solution 2: 
t <- which(m==1, arr.ind = TRUE)
as.matrix(aggregate(col~row, cbind(row=rownames(t), col=t[,2]), function(x) 
                                                    paste0(colnames(m)[x], collapse = "")))

how could something similar or arrive at my expected output in Python?

like image 304
Anand Thirtha Avatar asked Dec 14 '22 07:12

Anand Thirtha


1 Answers

df["Combination"] = df.iloc[:, 1:].dot(df.add_suffix("~~").columns[1:]).str[:-2]

We select columns except for Values with iloc and then form a dot product where the second operand is respective columns of df with ~~ added to the end. Result gives ~~ for the very end as well, so we chop it with .str[:-2]

to get

   Values  A1  A2  B1  B2  B3  B4  C1  C2  C3 Combination
0      10   1   0   1   0   0   0   1   0   0  A1~~B1~~C1
1      12   0   1   0   0   1   0   0   1   0  A2~~B3~~C2
2       3   0   1   0   1   0   0   0   0   1  A2~~B2~~C3
3       5   1   0   0   0   0   1   1   0   0  A1~~B4~~C1
like image 183
Mustafa Aydın Avatar answered Jan 03 '23 10:01

Mustafa Aydın