The request is simple: I want to select all rows which contain a value greater than a threshold. If I do it like this: <pre class="prettyprint"><code>df[(df > threshold)] </code></pre> I get these rows, but values below that threshold are simply <code>NaN</code>. How do I avoid selecting these rows?

This is actually very simple: <pre class="prettyprint"><code>df[df.T[(df.T > 0.33)].any()] </code></pre>

How to select all rows which contain values greater than a threshold?

Tags:

python

pandas

dataframe

The request is simple: I want to select all rows which contain a value greater than a threshold.

If I do it like this:

df[(df > threshold)]

I get these rows, but values below that threshold are simply NaN. How do I avoid selecting these rows?

948

asked Mar 05 '17 20:03

Stefan Falk

2 Answers

There is absolutely no need for the double transposition - you can simply call any along the column index (supplying 1 or 'columns') on your Boolean matrix.

df[(df > threshold).any(1)]

Example

>>> df = pd.DataFrame(np.random.randint(0, 100, 50).reshape(5, 10))

>>> df

    0   1   2   3   4   5   6   7   8   9
0  45  53  89  63  62  96  29  56  42   6
1   0  74  41  97  45  46  38  39   0  49
2  37   2  55  68  16  14  93  14  71  84
3  67  45  79  75  27  94  46  43   7  40
4  61  65  73  60  67  83  32  77  33  96

>>> df[(df > 95).any(1)]

    0   1   2   3   4   5   6   7   8   9
0  45  53  89  63  62  96  29  56  42   6
1   0  74  41  97  45  46  38  39   0  49
4  61  65  73  60  67  83  32  77  33  96

Transposing as your self-answer does is just an unnecessary performance hit.

df = pd.DataFrame(np.random.randint(0, 100, 10**8).reshape(10**4, 10**4))

# standard way
%timeit df[(df > 95).any(1)]
1 loop, best of 3: 8.48 s per loop

# transposing
%timeit df[df.T[(df.T > 95)].any()]
1 loop, best of 3: 13 s per loop

answered Sep 18 '22 01:09

miradulo

This is actually very simple:

df[df.T[(df.T > 0.33)].any()]

answered Sep 19 '22 01:09

Stefan Falk

Related questions
                            
                                Why does hasattr execute the @property decorator code block
                            
                                Python3 How to make a bytes object from a list of integers
                            
                                Import module works in terminal but not in IDLE
                            
                                Convert A Column In Pandas to One Long String (Python 3)
                            
                                How to fit a polynomial curve to data using scikit-learn?
                            
                                python: API token generation with itsdangerous
                            
                                How to return a specific point after an error in 'while' loop
                            
                                pythonic class instance attribute calculated from other attributes
                            
                                Django, Postgres - column cannot be cast automatically to type integer
                            
                                Python plyfile vs pymesh
                            
                                Authentication failed when using flask_pymongo
                            
                                Converting a 1.2GB list of edges into a sparse matrix
                            
                                Can Python Staticmethod Call Another Local Method?
                            
                                Package only binary compiled .so files of a python library compiled with Cython
                            
                                pycharm can't complete remote interpreter setup for Docker
                            
                                Pop-out / expand jupyter cell to new browser window
                            
                                Connecting to rabbitmq docker container from service in another container
                            
                                How to get random Decimal number from range? [duplicate]
                            
                                Python Pandas: Simple example of calculating RMSE from data frame
                            
                                Dynamically tile a tensor depending on the batch size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to select all rows which contain values greater than a threshold?

Tags:

python

pandas

dataframe

Stefan Falk

People also ask

2 Answers

miradulo

Stefan Falk

Recent Activity

Donate For Us