I have a pandas dataframe which has a numerical column "amount". The amount varies from 0 to 20000. I want to change it into categorical variable which defines a range. So, the categorical variable would be :
I am unable to figure out how to change the column. I can change it to a binary values like this :
months["value"] = np.where(months['amount']>=450, 'yes', 'no')
But, how to do it for categorical variable having more than 2 values?
Using the standard pandas Categorical constructor, we can create a category object. Here, the second argument signifies the categories. Thus, any value which is not present in the categories will be treated as NaN. Logically, the order means that, a is greater than b and b is greater than c.
You can use cut
:
df = pd.DataFrame({'B':[4000,5000,4000,9000,5,11040]})
df['D'] = pd.cut(df['B'], range(0, 21000, 1000))
print (df)
B D
0 4000 (3000, 4000]
1 5000 (4000, 5000]
2 4000 (3000, 4000]
3 9000 (8000, 9000]
4 5 (0, 1000]
5 11040 (11000, 12000]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With