Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reversing 'one-hot' encoding in Pandas

I want to go from this data frame which is basically one hot encoded.

 In [2]: pd.DataFrame({"monkey":[0,1,0],"rabbit":[1,0,0],"fox":[0,0,1]})      Out[2]:        fox  monkey  rabbit     0    0       0       1     1    0       1       0     2    1       0       0     3    0       0       0     4    0       0       0 

To this one which is 'reverse' one-hot encoded.

    In [3]: pd.DataFrame({"animal":["monkey","rabbit","fox"]})     Out[3]:        animal     0  monkey     1  rabbit     2     fox 

I imagine there's some sort of clever use of apply or zip to do thins but I'm not sure how... Can anyone help?

I've not had much success using indexing etc to try to solve this problem.

like image 777
Peadar Coyle Avatar asked Jul 12 '16 16:07

Peadar Coyle


People also ask

How do you reverse in Pandas?

Reversing the rows of a data frame in pandas can be done in python by invoking the loc() function. The panda's dataframe. loc() attribute accesses a set of rows and columns in the given data frame by either a label or a boolean array.

Which function in Pandas is used for one hot encoding?

The Pandas get dummies function, pd. get_dummies() , allows you to easily one-hot encode your categorical data.

What is OneHotEncoder in Python?

OneHotEncoder. Encode categorical integer features using a one-hot aka one-of-K scheme. The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features. The output will be a sparse matrix where each column corresponds to one possible value of one feature.


1 Answers

UPDATE: i think ayhan is right and it should be:

df.idxmax(axis=1) 

Demo:

In [40]: s = pd.Series(['dog', 'cat', 'dog', 'bird', 'fox', 'dog'])  In [41]: s Out[41]: 0     dog 1     cat 2     dog 3    bird 4     fox 5     dog dtype: object  In [42]: pd.get_dummies(s) Out[42]:    bird  cat  dog  fox 0   0.0  0.0  1.0  0.0 1   0.0  1.0  0.0  0.0 2   0.0  0.0  1.0  0.0 3   1.0  0.0  0.0  0.0 4   0.0  0.0  0.0  1.0 5   0.0  0.0  1.0  0.0  In [43]: pd.get_dummies(s).idxmax(1) Out[43]: 0     dog 1     cat 2     dog 3    bird 4     fox 5     dog dtype: object 

OLD answer: (most probably, incorrect answer)

try this:

In [504]: df.idxmax().reset_index().rename(columns={'index':'animal', 0:'idx'}) Out[504]:    animal  idx 0     fox    2 1  monkey    1 2  rabbit    0 

data:

In [505]: df Out[505]:    fox  monkey  rabbit 0    0       0       1 1    0       1       0 2    1       0       0 3    0       0       0 4    0       0       0 
like image 196
MaxU - stop WAR against UA Avatar answered Oct 08 '22 13:10

MaxU - stop WAR against UA