I have a pandas <code>DataFrame</code> called <code>data</code> with a column called <code>ms</code>. I want to eliminate all the rows where <code>data.ms</code> is above the 95% percentile. For now, I'm doing this: <pre class="prettyprint"><code>limit = data.ms.describe(90)['95%'] valid_data = data[data['ms'] < limit] </code></pre> which works, but I want to generalize that to any percentile. What's the best way to do that?

Use the <code>Series.quantile()</code> method: <pre class="prettyprint"><code>In [48]: cols = list('abc') In [49]: df = DataFrame(randn(10, len(cols)), columns=cols) In [50]: df.a.quantile(0.95) Out[50]: 1.5776961953820687 </code></pre> To filter out rows of <code>df</code> where <code>df.a</code> is greater than or equal to the 95th percentile do: <pre class="prettyprint"><code>In [72]: df[df.a < df.a.quantile(.95)] Out[72]: a b c 0 -1.044 -0.247 -1.149 2 0.395 0.591 0.764 3 -0.564 -2.059 0.232 4 -0.707 -0.736 -1.345 5 0.978 -0.099 0.521 6 -0.974 0.272 -0.649 7 1.228 0.619 -0.849 8 -0.170 0.458 -0.515 9 1.465 1.019 0.966 </code></pre>

Eliminating all data over a given percentile

limit = data.ms.describe(90)['95%'] valid_data = data[data['ms'] < limit]

which works, but I want to generalize that to any percentile. What's the best way to do that?

406

asked Sep 02 '13 20:09

Roy Smith

1 Answers

Use the Series.quantile() method:

In [48]: cols = list('abc')  In [49]: df = DataFrame(randn(10, len(cols)), columns=cols)  In [50]: df.a.quantile(0.95) Out[50]: 1.5776961953820687

To filter out rows of df where df.a is greater than or equal to the 95th percentile do:

In [72]: df[df.a < df.a.quantile(.95)] Out[72]:        a      b      c 0 -1.044 -0.247 -1.149 2  0.395  0.591  0.764 3 -0.564 -2.059  0.232 4 -0.707 -0.736 -1.345 5  0.978 -0.099  0.521 6 -0.974  0.272 -0.649 7  1.228  0.619 -0.849 8 -0.170  0.458 -0.515 9  1.465  1.019  0.966

166

answered Sep 20 '22 06:09

Phillip Cloud

Related questions
                            
                                sqlite3.OperationalError: unable to open database file
                            
                                Numpy: For every element in one array, find the index in another array
                            
                                How to insert pandas dataframe via mysqldb into database?
                            
                                Using abc.ABCMeta in a way it is compatible both with Python 2.7 and Python 3.5
                            
                                python: changes to my copy variable affect the original variable [duplicate]
                            
                                Easiest way to ignore blank lines when reading a file in Python
                            
                                Declaring a multi dimensional dictionary in python
                            
                                How to get rid of specific warning messages in python while keeping all other warnings as normal?
                            
                                numpy vstack vs. column_stack
                            
                                Python loop to run for certain amount of seconds
                            
                                asynchronous programming in python
                            
                                How to close a thread from within?
                            
                                Numpy extract submatrix
                            
                                Trim whitespace using PIL
                            
                                Python - How can I pad a string with spaces from the right and left?
                            
                                How to terminate process from Python using pid?
                            
                                How to filter dictionary keys based on its corresponding values
                            
                                Convert timestamp since epoch to datetime.datetime
                            
                                Save/dump a YAML file with comments in PyYAML
                            
                                How to output to the console and file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Eliminating all data over a given percentile

Tags:

python

pandas

filtering

percentile

Roy Smith

People also ask

1 Answers

Phillip Cloud

Recent Activity

Donate For Us