Assume I have:
df = pd.DataFrame({'gender': np.random.choice([1, 2], 10), 'height': np.random.randint(150, 210, 10)})
I'd like to make the gender column categorical. If I try:
df['gender'] = pd.Categorical.from_codes(df['gender'], ['female', 'male'])
it'll fail.
I can pad the categories
df['gender'] = pd.Categorical.from_codes(df['gender'], ['N/A', 'female', 'male'])
But then 'N/A' is returned in some methods:
In [67]: df['gender'].value_counts()
Out[67]:
female 5
male 5
N/A 0
Name: gender, dtype: int64
I thought about using None as the padding value. It works as intended in the value_counts however I get a warning:
opt/anaconda3/bin/ipython:1: FutureWarning:
Setting NaNs in `categories` is deprecated and will be removed in a future version of pandas.
#!/opt/anaconda3/bin/python
Any better way to do this? Also is there a way to give a mapping from code to category explicitly?
you can use rename_categories() method:
Demo:
In [33]: df
Out[33]:
gender height
0 1 203
1 2 169
2 2 181
3 1 172
4 2 174
5 1 166
6 2 187
7 2 200
8 1 208
9 1 201
In [34]: df['gender'] = df['gender'].astype('category').cat.rename_categories(['male','feemale'])
In [35]: df
Out[35]:
gender height
0 male 203
1 feemale 169
2 feemale 181
3 male 172
4 feemale 174
5 male 166
6 feemale 187
7 feemale 200
8 male 208
9 male 201
In [36]: df.dtypes
Out[36]:
gender category
height int32
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With