I have a dataframe with binary values post performing get_dummies using pandas
df=
Values A1 A2 B1 B2 B3 B4 C1 C2 C3
10 1 0 1 0 0 0 1 0 0
12 0 1 0 0 1 0 0 1 0
3 0 1 0 1 0 0 0 0 1
5 1 0 0 0 0 1 1 0 0
I want a new column which has combination of all columns which has 1's in it
Expected output:
Values A1 A2 B1 B2 B3 B4 C1 C2 C3 Combination
10 1 0 1 0 0 0 1 0 0 A1~~B1~~C1
12 0 1 0 0 1 0 0 1 0 A2~~B3~~C2
3 0 1 0 1 0 0 0 0 1 A2~~B2~~C3
5 1 0 0 0 0 1 1 0 0 A1~~B4~~C3
Actual matrix can be 25000+ rows*1000+columns
There is a similar solution found in R but i need it in Python bcoz all other dependencies are in python and R is new to me.
Extract column names with value 1 in binary matrix
Codes in R below, & need similar one or any other code in python which can help me to arrive at my expected output
Solution 1 :
as.matrix(apply(m==1,1,function(a) paste0(colnames(m)[a], collapse = "")))
Solution 2:
t <- which(m==1, arr.ind = TRUE)
as.matrix(aggregate(col~row, cbind(row=rownames(t), col=t[,2]), function(x)
paste0(colnames(m)[x], collapse = "")))
how could something similar or arrive at my expected output in Python?
df["Combination"] = df.iloc[:, 1:].dot(df.add_suffix("~~").columns[1:]).str[:-2]
We select columns except for Values
with iloc
and then form a dot product where the second operand is respective columns of df
with ~~
added to the end. Result gives ~~
for the very end as well, so we chop it with .str[:-2]
to get
Values A1 A2 B1 B2 B3 B4 C1 C2 C3 Combination
0 10 1 0 1 0 0 0 1 0 0 A1~~B1~~C1
1 12 0 1 0 0 1 0 0 1 0 A2~~B3~~C2
2 3 0 1 0 1 0 0 0 0 1 A2~~B2~~C3
3 5 1 0 0 0 0 1 1 0 0 A1~~B4~~C1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With