Consider a Series with the following percentiles:
> df['col_1'].describe(percentiles=np.linspace(0, 1, 20))
count 13859.000000
mean 421.772842
std 14665.298998
min 1.201755
0% 1.201755
5.3% 1.430695
10.5% 1.438417
15.8% 1.466462
21.1% 1.473050
26.3% 1.500834
31.6% 1.512218
36.8% 1.542935
42.1% 1.579845
47.4% 1.647162
50% 1.690612
52.6% 1.749047
57.9% 1.955589
63.2% 2.344475
68.4% 3.075641
73.7% 4.466094
78.9% 8.410964
84.2% 14.998738
89.5% 41.363612
94.7% 162.865079
100% 1511013.790233
max 1511013.790233
Name: col_1, dtype: float64
I would like to get another column col_2
with the percentile each row was assigned to in the calculation made above.
How can I do that in Pandas?
By default, Pandas will use a parameter of q=0.5 , which will generate the 50th percentile.
quantile() function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile. For example, numpy. quantile(data, 0.25) returns the value at the first quartile of the dataset data .
groupby('Category'). field_A. quantile(0.1) . That will return the 10th percentile for each group of Category .
Step 1: Define a Pandas series. Step 2: Input percentile value. Step 3: Calculate the percentile. Step 4: Print the percentile.
df2 = pd.DataFrame(range(1000))
df2.columns = ['a1']
df2['percentile'] = pd.qcut(df2.a1,100, labels=False)
Or leave out labels to see the range
Note that in Python 3, with Pandas 0.16.2 (latest version as of today), you need to use list(range(1000))
instead of range(1000)
for the above to work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With