Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Bin labels must be one fewer than the number of bin edges" after passing pd.qcut duplicates='drop' kwarg

Tags:

python

pandas

I have a df with hundreds of thousands of rows, and am creating a new dataframe which only contains the top quantile of rows for some group of values:

quantiles = (df.groupby(['Person', 'Date'])['Value'].apply(lambda x: pd.qcut(x, 4, labels=[0, 0.25, 0.5, 1], duplicates='drop')))

When I run it, I get:

ValueError: Bin labels must be one fewer than the number of bin edges

After trying to change the number of bins to 5 I still get the same error.

How can I fix this?

like image 828
MSD Avatar asked Dec 25 '19 21:12

MSD


1 Answers

I was facing the same issue and I did this to overcome it.

bins = number of times the data is being sliced

labels = the range you are categorizing using labels.

This error appears when labels > bins

follow these steps:

Step. 1: Don't pass labels initially

train['MasVnrArea'] = pd.qcut(train['MasVnrArea'],
                          q=5,duplicates='drop')

This will result in:

(-0.001, 16.0]     880
(205.2, 1600.0]    292
(16.0, 205.2]      288
Name: MasVnrArea, dtype: int64

Step 2:

Now we can see there are only three categories which are possible on binned. So, assign labels accordingly. In my case, it is 3. So I am passing 3 labels.

bin_labels_MasVnrArea = ['Platinum_MasVnrArea', 
                         'Diamond_MasVnrArea','Supreme_MasVnrArea']
train['MasVnrArea'] = pd.qcut(train['MasVnrArea'],
                              q=5,labels=bin_labels_MasVnrArea,duplicates='drop')

Please watch this video on bins for a clear understanding.

https://www.youtube.com/watch?v=HofOMf8RgjM
like image 116
Amit Bidlan Avatar answered Oct 02 '22 21:10

Amit Bidlan