Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas cut with non-unique labels

Tags:

python

pandas

I'm trying to bin data and apply a float value based on the bin. I thought pandas.cut was the tool for this, but apparently it requires unique values for each bin label.

values = [0.6, 0.5, 0.5, 0.6, 0.8, 0.9]
bins = [0, 2, 5, 10, 15, 25, 200]
binned = pd.cut(original_table[field], bins, labels=values)

>>> ValueError: Categorical categories must be unique

My data (original_table) is very large and doing anything iteratively is quite slow, which is why cut was an appealing tool. Is there a workaround to make pd.cut work for this?

like image 667
triphook Avatar asked Jan 19 '26 03:01

triphook


1 Answers

Here is another option to circumvent this issue, which I have found here. Also looks like it will be fixed soon

import pandas as pd
import numpy as np


values = [0.6, 0.5, 0.5, 0.6, 0.8, 0.9]
bins = [0, 2, 5, 10, 15, 25, 200]

# Cut it
binned = pd.cut(original_table[field], bins, labels=pd.Categorical(values))
like image 121
Arleg Avatar answered Jan 21 '26 17:01

Arleg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!