Filter pandas dataframe rows by multiple column values

Tags:

I have a pandas dataframe containing rows with numbered columns:

    1  2  3  4  5
a   0  0  0  0  1
b   1  1  2  1  9             
c   2  2  2  2  2
d   5  5  5  5  5
e   8  9  9  9  9

How can I filter out the rows where a subset of columns are all above or below a certain value?

So, for example: I want to remove all rows where columns 1 to 3 all values are not > 3. In the above, that would leave me with only rows d and e.

The columns I am filtering and the value I am checking against are both arguments.

I've tried a few things, this is the closest I've gotten:

df[df[range(1,3)]>3]

Any ideas?

450

asked Jun 09 '16 22:06

quannabe

2 Answers

I used loc and all in this function:

def filt(df, cols, thresh):
    return df.loc[(df[cols] > thresh).all(axis=1)]

filt(df, [1, 2, 3], 3)

   1  2  3  4  5
d  5  5  5  5  5
e  8  9  9  9  9

184

answered Nov 14 '22 23:11

piRSquared

You can achieve this without using apply:

In [73]:
df[(df.ix[:,0:3] > 3).all(axis=1)]

Out[73]:
   1  2  3  4  5
d  5  5  5  5  5
e  8  9  9  9  9

So this slices the df to just the first 3 columns using ix and then we compare against the scalar 3 and then call all(axis=1) to create a boolean series to mask the index

answered Nov 15 '22 00:11

EdChum

Related questions
                            
                                How can one mark a flag as required with gflags?
                            
                                Download azure blob via stream - Exit 137
                            
                                How to scan for a string literal allowing escaped characters?
                            
                                Is it possible to trigger a mousePressEvent artificially on a QWebView?
                            
                                Determinate if class has user defined __init__
                            
                                How can I declare a Column as a categorical feature in a DataFrame for use in ml
                            
                                What does ${python3:Depends} mean in a debian source-package control file?
                            
                                attributeError: can't set attribute with flask-SQLAlchemy [duplicate]
                            
                                Error Installing Pyproj in Python 3.5
                            
                                Rearrange a pandas data frame to create a 2d ratings matrix
                            
                                Accelerating one-to-many correlation calculations in Python
                            
                                Feeding a Python array into a Perl script
                            
                                PyImport_ImportModule, possible to load module from memory?
                            
                                Normalize the elements of columns in an array to 1 or -1 depending on their sign
                            
                                Passing Python functions as objects to Spark
                            
                                How can I slice a dataframe by timestamp, when timestamp isn't classified as index?
                            
                                Complex non-greedy matching with regular expressions
                            
                                How to return primary keys generated from a COPY FROM statement in postgreSQL?
                            
                                Differences between Cython, extending C/C++ with Python.h, etc
                            
                                Detecting Peaks in a FFT Plot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Filter pandas dataframe rows by multiple column values

Tags:

python

pandas

dataframe

quannabe

People also ask

2 Answers

piRSquared

EdChum

Recent Activity

Donate For Us