I am trying to calculate percentile for every value in column <code>a</code> from a DataFrame <code>x</code>. Is there a better way to write the following piece of code? <pre class="prettyprint"><code>x["pcta"] = [stats.percentileofscore(x["a"].values, i) for i in x["a"].values] </code></pre> I would like to see better performance.

It seems like you want <code>Series.rank()</code>: <pre class="prettyprint"><code>x.loc[:, 'pcta'] = x.rank(pct=True) # will be in decimal form </code></pre> Performance: <pre class="prettyprint"><code>import scipy.stats as scs %timeit [scs.percentileofscore(x["a"].values, i) for i in x["a"].values] 1000 loops, best of 3: 877 µs per loop %timeit x.rank(pct=True) 10000 loops, best of 3: 107 µs per loop </code></pre>

Calculate percentile for every value in a column of dataframe

Tags:

performance

python

pandas

scipy

percentile

I am trying to calculate percentile for every value in column a from a DataFrame x.

Is there a better way to write the following piece of code?

x["pcta"] = [stats.percentileofscore(x["a"].values, i) 
                                    for i in x["a"].values]

I would like to see better performance.

227

asked May 27 '17 00:05

Praveen Gupta Sanka

1 Answers

It seems like you want Series.rank():

x.loc[:, 'pcta'] = x.rank(pct=True) # will be in decimal form

Performance:

import scipy.stats as scs

%timeit [scs.percentileofscore(x["a"].values, i) for i in x["a"].values]
1000 loops, best of 3: 877 µs per loop

%timeit x.rank(pct=True)
10000 loops, best of 3: 107 µs per loop

170

answered Oct 06 '22 12:10

Brad Solomon

Related questions
                            
                                What is the equivalent of from django.views.generic.simple import direct_to_template in django 1.9
                            
                                How to remove the Windows PATH from a Sublime Text 3 Python build error?
                            
                                How to get IAM Policy Document via boto
                            
                                While debugging, how to print all variables (which is in list format) who are trainable in Tensorflow?
                            
                                Any way to access methods from individual stages in PySpark PipelineModel?
                            
                                Missing dll files when using pyinstaller
                            
                                Python: How to catch inner exception of exception chain?
                            
                                how to find the complement of two dataframes
                            
                                Vocabulary Processor function
                            
                                I have a RSA public key exponent and modulus. How can I encrypt a string using Python?
                            
                                Transform a set of numbers in numpy so that each number gets converted into a number of other numbers which are less than it
                            
                                Pivot table subtotals in Pandas
                            
                                I get 'continuation line under-indented for visual indent' error
                            
                                ImportError: No module named _ctypes. Google app engine with bokeh plot
                            
                                Creating pandas dataframe from a list of strings
                            
                                When I do pip --version it show the error as ImportError: No module named pyparsing
                            
                                Creating/Uploading new file at Google Cloud Storage bucket using Python
                            
                                Python - Trying to create a dictionary through a for loop
                            
                                Pandas DataFrame Read Skipping line XXX: expected X fields, saw Y
                            
                                Running a .sql file after migrations in django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With