Placing every value in its percentile in Pandas

Tags:

Consider a Series with the following percentiles:

> df['col_1'].describe(percentiles=np.linspace(0, 1, 20))

count      13859.000000
mean         421.772842
std        14665.298998
min            1.201755
0%             1.201755
5.3%           1.430695
10.5%          1.438417
15.8%          1.466462
21.1%          1.473050
26.3%          1.500834
31.6%          1.512218
36.8%          1.542935
42.1%          1.579845
47.4%          1.647162
50%            1.690612
52.6%          1.749047
57.9%          1.955589
63.2%          2.344475
68.4%          3.075641
73.7%          4.466094
78.9%          8.410964
84.2%         14.998738
89.5%         41.363612
94.7%        162.865079
100%     1511013.790233
max      1511013.790233
Name: col_1, dtype: float64

I would like to get another column col_2 with the percentile each row was assigned to in the calculation made above.

How can I do that in Pandas?

851

asked Jun 18 '15 19:06

Amelio Vazquez-Reina

1 Answers

df2 = pd.DataFrame(range(1000))
df2.columns = ['a1']
df2['percentile'] = pd.qcut(df2.a1,100, labels=False)

Or leave out labels to see the range

Note that in Python 3, with Pandas 0.16.2 (latest version as of today), you need to use list(range(1000)) instead of range(1000) for the above to work.

answered Sep 24 '22 21:09

howMuchCheeseIsTooMuchCheese

Related questions
                            
                                How can I render JavaScript HTML to HTML in python?
                            
                                How does numpy reshape works?
                            
                                Removing NAN's from numpy 2-D arrays
                            
                                Pairwise operations (distance) on two lists in numpy
                            
                                Python/numpy issue with array/vector with empty second dimension
                            
                                Feign focus in Selenium chrome browser
                            
                                How to call a function only Once in Python [closed]
                            
                                Convert python-igraph graph to networkx
                            
                                find stretches of Trues in numpy array
                            
                                Numpy sum running length of non-zero values
                            
                                psycopg, double and single quotes insert
                            
                                Passing a class to another class (Python)
                            
                                Getting an error in Python when trying to use stdin: io.UnsupportedOperation: fileno
                            
                                How to remove just the index name and not the content in Pandas multiindex data frame
                            
                                Get attribute from a super class in python
                            
                                Disable ssl certificate validation in mechanize
                            
                                How should I use argcomplete in zsh?
                            
                                How can I get Django to return JsonResponse with no extra quotes or quote escapes?
                            
                                Attachments getting attached twice using smptplib in python
                            
                                Python requests throwing SSL errors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Placing every value in its percentile in Pandas

Tags:

python

pandas

statistics

Amelio Vazquez-Reina

People also ask

1 Answers

howMuchCheeseIsTooMuchCheese

Recent Activity

Donate For Us