Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert (many) integer-valued rows into binary indicator columns using Pandas

I am working on a task that seems to me a little like one-hot encoding, but notably different. What I want to do is take a row of integers from a Pandas DataFrame and produce a binary column with 1's at the index locations specified by the integers and 0's everywhere else. If possible, I would like to do this for many rows at the same time. So a trivial example would be given as taking

index A B C
0 1 4 7
1 2 5 8
2 3 6 9

and producing

index .0 .1 .2
0 0 0 0
1 1 0 0
2 0 1 0
3 0 0 1
4 1 0 0
5 0 1 0
6 0 0 1
7 1 0 0
8 0 1 0
9 0 0 1

I have tried

new_df = pandas.DataFrame(range(old_df.max(axis=None)+1)).isin(list(old_df.iloc[0]))

which works for a single row (the first row of the old_df in this case), but doesn't seem easily scalable to an arbitrary number of rows. Is there a built in function that does something similar to this?

like image 413
lane-h-rogers Avatar asked May 20 '26 20:05

lane-h-rogers


1 Answers

Another possible solution:

pd.get_dummies(df.stack()).groupby(level=0).sum().T

First, it uses stack to pivot the data from a wide to a long format, creating a series with a MultiIndex that pairs original row and column positions. This long-format series is then fed into get_dummies to perform the one-hot encoding, converting the integer values into new binary columns. Subsequently, groupby(level=0).sum() aggregates these results by summing the binary indicators for each of the original rows (the first level of the MultiIndex). Finally, the .T attribute pivots the resulting dataframe.

Output:

   0  1  2
1  1  0  0
2  0  1  0
3  0  0  1
4  1  0  0
5  0  1  0
6  0  0  1
7  1  0  0
8  0  1  0
9  0  0  1

Intermediates:


# (df.stack(), 
#  pd.get_dummies(df.stack()),  
#  [g for g in pd.get_dummies(df.stack()).groupby(level=0)], 
#  pd.get_dummies(df.stack()).groupby(level=0).sum())

(0  A    1
    B    4
    C    7
 1  A    2
    B    5
    C    8
 2  A    3
    B    6
    C    9
 dtype: int64,
          1      2      3      4      5      6      7      8      9
 0 A   True  False  False  False  False  False  False  False  False
   B  False  False  False   True  False  False  False  False  False
   C  False  False  False  False  False  False   True  False  False
 1 A  False   True  False  False  False  False  False  False  False
   B  False  False  False  False   True  False  False  False  False
   C  False  False  False  False  False  False  False   True  False
 2 A  False  False   True  False  False  False  False  False  False
   B  False  False  False  False  False   True  False  False  False
   C  False  False  False  False  False  False  False  False   True,
 [(0,
            1      2      3      4      5      6      7      8      9
   0 A   True  False  False  False  False  False  False  False  False
     B  False  False  False   True  False  False  False  False  False
     C  False  False  False  False  False  False   True  False  False),
  (1,
            1      2      3      4      5      6      7      8      9
   1 A  False   True  False  False  False  False  False  False  False
     B  False  False  False  False   True  False  False  False  False
     C  False  False  False  False  False  False  False   True  False),
  (2,
            1      2      3      4      5      6      7      8      9
   2 A  False  False   True  False  False  False  False  False  False
     B  False  False  False  False  False   True  False  False  False
     C  False  False  False  False  False  False  False  False   True)],
    1  2  3  4  5  6  7  8  9
 0  1  0  0  1  0  0  1  0  0
 1  0  1  0  0  1  0  0  1  0
 2  0  0  1  0  0  1  0  0  1)

like image 129
PaulS Avatar answered May 22 '26 20:05

PaulS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!