Supose I have the following DataFrame:
Area
0 14.68
1 40.54
2 10.82
3 2.31
4 22.3
And I want to categorize that values in range. Like A: [1,10], B: [11,20], C...
Area
0 B
1 D
2 C
3 A
4 C
How can I do it with Pandas? I tried following code:
bins = pd.IntervalIndex.from_tuples([(0, 11), (11, 20), (20, 50), (50, 100), (100, 500), (500, np.max(df["area"]) + 1)], closed='left')
catDf = pd.cut(df["area"], bins = bins)
But "cut" command just put range values in DataFrame and I want put the categories names instead of range.
EDIT: I tried to pass label to the cut, but nothing changes. EDIT2: To clarify, if the value of "area" have 10.21, so it's in range of [10,20], so it must be labeled like "B" or other label for that range of values.
For me working cat.codes
with indexing by converting list a
to numpy array:
a = list('ABCDEF')
df['new'] = np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes]
print (df)
Area new
0 14.68 B
1 40.54 C
2 10.82 A
3 2.31 A
4 22.30 C
5 600.00 F
catDf = pd.Series(np.array(a)[pd.cut(df["Area"], bins = bins).cat.codes], index=df.index)
print (catDf)
0 B
1 C
2 A
3 A
4 C
5 F
dtype: object
You can specify the labels like following:
Note not sure which ranges you used:
pd.cut(df.Area, [1,10, 20, 50, 100], labels=['A', 'B', 'C', 'D'])
0 B
1 C
2 B
3 A
4 C
Name: Area, dtype: category
Categories (4, object): [A < B < C < D]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With