Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame: How to convert binary columns into one categorical column?

Given a pandas DataFrame, how does one convert several binary columns (where 1 denotes the value exists, 0 denotes it doesn't) into a single categorical column?

Another way to think of this is how to perform the "reverse pd.get_dummies()"?

Here is an example of converting a categorical column into several binary columns:

import pandas as pd
s = pd.Series(list('ABCDAB'))
df = pd.get_dummies(s)
df
   A  B  C  D
0  1  0  0  0
1  0  1  0  0
2  0  0  1  0
3  0  0  0  1
4  1  0  0  0
5  0  1  0  0

What I would like to accomplish is given a dataframe

df1
   A  B  C  D
0  1  0  0  0
1  0  1  0  0
2  0  0  1  0
3  0  0  0  1
4  1  0  0  0
5  0  1  0  0

could do I convert it into

df1
   A  B  C  D   category
0  1  0  0  0   A
1  0  1  0  0   B
2  0  0  1  0   C
3  0  0  0  1   D
4  1  0  0  0   A
5  0  1  0  0   B
like image 298
ShanZhengYang Avatar asked Apr 12 '17 23:04

ShanZhengYang


People also ask

How to convert categorical data into binary data in Python?

Step 1) In order to convert Categorical Data into Binary Data we use some function which is available in Pandas Framework. That’s why Pandas framework is imported Step2) After that a list is created and data is entered as shown below.

How to convert is_Male column to categorical in pandas?

view source print? view source print? Data type of Is_Male column is integer . so let’s convert it into categorical. view source print? as.type () function takes ‘category’ as argument and converts the column to categorical in pandas as shown below.

How do I batch convert a Dataframe to categorical data?

Categorical data has a specific category dtype: Similar to the previous section where a single column was converted to categorical, all columns in a DataFrame can be batch converted to categorical either during or after construction. This can be done during construction by specifying dtype="category" in the DataFrame constructor:

What is pandas categorical data type?

This is an introduction to pandas categorical data type, including a short comparison with R’s factor. Categoricals are a pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited, and usually fixed, number of possible values (categories; levels in R).


1 Answers

One way would be to use idxmax to find the 1s:

In [32]: df["category"] = df.idxmax(axis=1)

In [33]: df
Out[33]: 
   A  B  C  D category
0  1  0  0  0        A
1  0  1  0  0        B
2  0  0  1  0        C
3  0  0  0  1        D
4  1  0  0  0        A
5  0  1  0  0        B
like image 74
DSM Avatar answered Oct 07 '22 01:10

DSM