Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Qcut Pandas : ValueError: Bin edges must be unique

I'm using Qcut from Pandas in order to discretize my Data into equal-sized buckets. I want to have price buckets. This is my DataFrame :

        productId   sell_prix   categ   popularity
11997   16758760.0  28.75        50      524137.0
11998   16758760.0  28.75        50      166795.0
13154   16782105.0  24.60        50      126890.5
13761   16790082.0  65.00        50      245437.0
13762   16790082.0  65.00        50      245242.0
15355   16792720.0  29.00        50      360219.0
15356   16792720.0  29.00        50      360100.0
15357   16792720.0  29.00        50      360027.0
15358   16792720.0  29.00        50      462850.0
15367   16792728.0  29.00        50      193030.5

And this is my code :

df['PriceBucket'] = pd.qcut(df['sell_prix'], 3)

I have this error message :

**ValueError: Bin edges must be unique: array([ 24.6,  29. ,  29. ,  65. ])**

In reality, I have a DataFrame with 7413 rows. So this is just a sampling of the real DataFrame. The strange thing is that when I use the same code with a DataFrame with 359824 rows, with practically the same Data, it works ! Is there any dependence with the length of DataFrame ?

Help please ! Many thanks.

like image 986
Arij SEDIRI Avatar asked Jul 11 '16 14:07

Arij SEDIRI


1 Answers

Various solutions are discussed here, but briefly:

> pd.qcut(df['a'].rank(method='first'), 3)
0        [1, 2.333]
1        [1, 2.333]
2    (2.333, 3.667]
3        (3.667, 5]
4        (3.667, 5]

Or

> pd.qcut(df['a'].rank(method='first'), 3, labels=False)
0    0
1    0
2    1
3    2
4    2
like image 115
luca Avatar answered Oct 13 '22 12:10

luca