Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rename categories after using pandas.cut with IntervalIndex?

Tags:

python

pandas

I discretized a column in my dataframe using pandas.cut with bins created by IntervalIndex.from_tuples.

The cut works as intended however the categories are shown as the tuples I specified in the IntervalIndex. Is there any way to rename the categories into a different label e.g. (Small, Medium, Large)?

Example:

bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)

The resulting categories will be:

[NaN, (0, 1], NaN, (2, 3], (4, 5]]
Categories (3, interval[int64]): [(0, 1] < (2, 3] < (4, 5]]

I am trying to change [(0, 1] < (2, 3] < (4, 5]] into something like 1, 2 ,3 or small, medium ,large.

Sadly, the labels parameter arguments of pd.cut is ignored when using IntervalIndex.

Thanks!

UPDATE:

Thanks to @SergeyBushmanov I noticed that this issue only exist when trying to change category labels inside a dataframe (which is what I am trying to do). Updated example:

In [1]: df = pd.DataFrame([0, 0.5, 1.5, 2.5, 4.5], columns = ['col1'])
In [2]: bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
In [3]: df['col1'] = pd.cut(df['col1'], bins)
In [4]: df['col1'].categories = ['small','med','large']

In [5]: df['col1']

Out [5]:
0       NaN
1    (0, 1]
2       NaN
3    (2, 3]
4    (4, 5]
Name: col1, dtype: category
Categories (3, interval[int64]): [(0, 1] < (2, 3] < (4, 5]]
like image 262
Yaser Baqi Avatar asked Mar 17 '19 06:03

Yaser Baqi


2 Answers

If we have some data:

bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut([0, 0.5, 1.5, 2.5, 4.5], bins)

You may try re-assigning categories like :

In [7]: x.categories = [1,2,3]

In [8]: x   
Out[8]: 
[NaN, 1, NaN, 2, 3]
Categories (3, int64): [1 < 2 < 3]

or:

In [9]: x.categories = ["small", "medium", "big"]                         

In [10]: x                                             
Out[10]: 
[NaN, small, NaN, medium, big]
Categories (3, object): [small < medium < big]

UPDATE:

df = pd.DataFrame([0, 0.5, 1.5, 2.5, 4.5], columns = ['col1'])
bins = pd.IntervalIndex.from_tuples([(0, 1), (2, 3), (4, 5)])
x = pd.cut(df["col1"].to_list(),bins)
x.categories = [1,2,3]
df['col1'] = x
df.col1
0    NaN
1      1
2    NaN
3      2
4      3
Name: col1, dtype: category
Categories (3, int64): [1 < 2 < 3]

UPDATE 2:

In newer versions of pandas, instead of reassigning categories using x.categories = [1, 2, 3], x.cat.rename_categories should be used:

labels = [1, 2, 3]
x = x.rename_categories(labels)

labels can be of any type, and in any case, the original categorical order that was set when creating the pd.IntervalIndex will be preserved.

like image 104
Sergey Bushmanov Avatar answered Sep 20 '22 05:09

Sergey Bushmanov


series = pd.Series([0, 0.5, 1.5, 2.5, 4.5])

bins = [(0, 1), (2, 3), (4, 5)]
index = pd.IntervalIndex.from_tuples(bins)
intervals = index.values
names = ['small', 'med', 'large']
to_name = {interval: name for interval, name in zip(intervals, names)}

named_series = pd.Series(
    pd.CategoricalIndex(pd.cut(series, bins_index)).rename_categories(to_name)
)
print(named_series)

0      NaN
1    small
2      NaN
3      med
4    large
dtype: category
Categories (3, object): ['small' < 'med' < 'large']
like image 41
lunalcni Avatar answered Sep 17 '22 05:09

lunalcni