Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Placing every value in its percentile in Pandas

Consider a Series with the following percentiles:

> df['col_1'].describe(percentiles=np.linspace(0, 1, 20))

count      13859.000000
mean         421.772842
std        14665.298998
min            1.201755
0%             1.201755
5.3%           1.430695
10.5%          1.438417
15.8%          1.466462
21.1%          1.473050
26.3%          1.500834
31.6%          1.512218
36.8%          1.542935
42.1%          1.579845
47.4%          1.647162
50%            1.690612
52.6%          1.749047
57.9%          1.955589
63.2%          2.344475
68.4%          3.075641
73.7%          4.466094
78.9%          8.410964
84.2%         14.998738
89.5%         41.363612
94.7%        162.865079
100%     1511013.790233
max      1511013.790233
Name: col_1, dtype: float64

I would like to get another column col_2 with the percentile each row was assigned to in the calculation made above.

How can I do that in Pandas?

like image 851
Amelio Vazquez-Reina Avatar asked Jun 18 '15 19:06

Amelio Vazquez-Reina


People also ask

How do you get 50th percentile in pandas?

By default, Pandas will use a parameter of q=0.5 , which will generate the 50th percentile.

What is quantile () function in Python?

quantile() function takes an array and a number say q between 0 and 1. It returns the value at the q th quantile. For example, numpy. quantile(data, 0.25) returns the value at the first quartile of the dataset data .

How do you find the 10th percentile in pandas?

groupby('Category'). field_A. quantile(0.1) . That will return the 10th percentile for each group of Category .

How is pandas series percentile calculated?

Step 1: Define a Pandas series. Step 2: Input percentile value. Step 3: Calculate the percentile. Step 4: Print the percentile.


1 Answers

df2 = pd.DataFrame(range(1000))
df2.columns = ['a1']
df2['percentile'] = pd.qcut(df2.a1,100, labels=False)

Or leave out labels to see the range


Note that in Python 3, with Pandas 0.16.2 (latest version as of today), you need to use list(range(1000)) instead of range(1000) for the above to work.

like image 97
howMuchCheeseIsTooMuchCheese Avatar answered Sep 24 '22 21:09

howMuchCheeseIsTooMuchCheese