I'm using Qcut from Pandas in order to discretize my Data into equal-sized buckets. I want to have price buckets. This is my DataFrame :
productId sell_prix categ popularity
11997 16758760.0 28.75 50 524137.0
11998 16758760.0 28.75 50 166795.0
13154 16782105.0 24.60 50 126890.5
13761 16790082.0 65.00 50 245437.0
13762 16790082.0 65.00 50 245242.0
15355 16792720.0 29.00 50 360219.0
15356 16792720.0 29.00 50 360100.0
15357 16792720.0 29.00 50 360027.0
15358 16792720.0 29.00 50 462850.0
15367 16792728.0 29.00 50 193030.5
And this is my code :
df['PriceBucket'] = pd.qcut(df['sell_prix'], 3)
I have this error message :
**ValueError: Bin edges must be unique: array([ 24.6, 29. , 29. , 65. ])**
In reality, I have a DataFrame with 7413 rows. So this is just a sampling of the real DataFrame. The strange thing is that when I use the same code with a DataFrame with 359824 rows, with practically the same Data, it works ! Is there any dependence with the length of DataFrame ?
Help please ! Many thanks.
Various solutions are discussed here, but briefly:
> pd.qcut(df['a'].rank(method='first'), 3)
0 [1, 2.333]
1 [1, 2.333]
2 (2.333, 3.667]
3 (3.667, 5]
4 (3.667, 5]
Or
> pd.qcut(df['a'].rank(method='first'), 3, labels=False)
0 0
1 0
2 1
3 2
4 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With