Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pd.Categorical.from_codes with missing values

Tags:

pandas

Assume I have:

df = pd.DataFrame({'gender': np.random.choice([1, 2], 10), 'height': np.random.randint(150, 210, 10)})

I'd like to make the gender column categorical. If I try:

df['gender'] = pd.Categorical.from_codes(df['gender'], ['female', 'male'])

it'll fail.

I can pad the categories

df['gender'] = pd.Categorical.from_codes(df['gender'], ['N/A', 'female', 'male'])

But then 'N/A' is returned in some methods:

In [67]: df['gender'].value_counts()
Out[67]: 
female    5
male      5
N/A       0
Name: gender, dtype: int64

I thought about using None as the padding value. It works as intended in the value_counts however I get a warning:

opt/anaconda3/bin/ipython:1: FutureWarning: 
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
  #!/opt/anaconda3/bin/python

Any better way to do this? Also is there a way to give a mapping from code to category explicitly?

like image 608
lazy1 Avatar asked Mar 11 '26 15:03

lazy1


1 Answers

you can use rename_categories() method:

Demo:

In [33]: df
Out[33]:
   gender  height
0       1     203
1       2     169
2       2     181
3       1     172
4       2     174
5       1     166
6       2     187
7       2     200
8       1     208
9       1     201

In [34]: df['gender'] = df['gender'].astype('category').cat.rename_categories(['male','feemale'])

In [35]: df
Out[35]:
    gender  height
0     male     203
1  feemale     169
2  feemale     181
3     male     172
4  feemale     174
5     male     166
6  feemale     187
7  feemale     200
8     male     208
9     male     201

In [36]: df.dtypes
Out[36]:
gender    category
height       int32
dtype: object
like image 175
MaxU - stop WAR against UA Avatar answered Mar 14 '26 03:03

MaxU - stop WAR against UA



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!