I have a dataframe with a column being categorical.
I remove all the rows having one the categories.
How can I make sure the resulting dataframe has only those categories that exist and does not keep the deleted categories in its index?
df = pd.DataFrame({'color':np.random.choice(['Blue','Green','Brown','Red'], 50)})
df.color = df.color.astype('category')
df.color.head()
Output:
0 Blue
1 Green
2 Blue
3 Green
4 Brown
Name: color, dtype: category
Categories (4, object): [Blue, Brown, Green, Red]
Remove Brown from dataframe and category.
df = df.query('color != "Brown"')
df.color = df.color.cat.remove_categories('Brown')
df.color.head()
Output:
0 Blue
1 Green
2 Blue
3 Green
7 Red
Name: color, dtype: category
Categories (3, object): [Blue, Green, Red]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With