I have a df
with hundreds of thousands of rows, and am creating a new dataframe which only contains the top quantile of rows for some group of values:
quantiles = (df.groupby(['Person', 'Date'])['Value'].apply(lambda x: pd.qcut(x, 4, labels=[0, 0.25, 0.5, 1], duplicates='drop')))
When I run it, I get:
ValueError: Bin labels must be one fewer than the number of bin edges
After trying to change the number of bins
to 5
I still get the same error.
How can I fix this?
I was facing the same issue and I did this to overcome it.
bins = number of times the data is being sliced
labels = the range you are categorizing using labels.
This error appears when labels > bins
follow these steps:
Step. 1: Don't pass labels initially
train['MasVnrArea'] = pd.qcut(train['MasVnrArea'],
q=5,duplicates='drop')
This will result in:
(-0.001, 16.0] 880
(205.2, 1600.0] 292
(16.0, 205.2] 288
Name: MasVnrArea, dtype: int64
Step 2:
Now we can see there are only three categories which are possible on binned. So, assign labels accordingly. In my case, it is 3. So I am passing 3 labels.
bin_labels_MasVnrArea = ['Platinum_MasVnrArea',
'Diamond_MasVnrArea','Supreme_MasVnrArea']
train['MasVnrArea'] = pd.qcut(train['MasVnrArea'],
q=5,labels=bin_labels_MasVnrArea,duplicates='drop')
Please watch this video on bins for a clear understanding.
https://www.youtube.com/watch?v=HofOMf8RgjM
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With