Filtering dataframes in pandas : use a list of conditions

Tags:

pandas

I have a pandas dataframe with two dimensions : 'col1' and 'col2'

I can filter certain values of those two columns using :

df[ (df["col1"]=='foo') & (df["col2"]=='bar')]

Is there any way I can filter both columns at once ?

I tried naively to use the restriction of the dataframes to two columns, but my best guesses for the second part of the equality don't work :

df[df[["col1","col2"]]==['foo','bar']]

yields me this error

ValueError: Invalid broadcasting comparison [['foo', 'bar']] with block values

I need to do this because the names of the columns, but also the number of columns on which the condition will be set will vary

600

asked Nov 13 '15 18:11

2 Answers

To the best of my knowledge, there is no way in Pandas for you to do what you want. However, although the following solution may not me the most pretty, you can zip a set of parallel lists as follows:

cols = ['col1', 'col2']
conditions = ['foo', 'bar']

df[eval(" & ".join(["(df['{0}'] == '{1}')".format(col, cond) 
   for col, cond in zip(cols, conditions)]))]

The string join results in the following:

>>> " & ".join(["(df['{0}'] == '{1}')".format(col, cond) 
    for col, cond in zip(cols, conditions)])

"(df['col1'] == 'foo') & (df['col2'] == 'bar')"

Which you then use eval to evaluate, effectively:

df[eval("(df['col1'] == 'foo') & (df['col2'] == 'bar')")]

For example:

df = pd.DataFrame({'col1': ['foo', 'bar, 'baz'], 'col2': ['bar', 'spam', 'ham']})

>>> df
  col1  col2
0  foo   bar
1  bar  spam
2  baz   ham

>>> df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
            for col, cond in zip(cols, conditions)]))]
  col1 col2
0  foo  bar

149

answered Dec 05 '22 07:12

Alexander

I would like to point out an alternative for the accepted answer as eval is not necessary for solving this problem.

from functools import reduce

df = pd.DataFrame({'col1': ['foo', 'bar', 'baz'], 'col2': ['bar', 'spam', 'ham']})
cols = ['col1', 'col2']
values = ['foo', 'bar']
conditions = zip(cols, values)

def apply_conditions(df, conditions):
    assert len(conditions) > 0
    comps = [df[c] == v for c, v in conditions]
    result = comps[0]
    for comp in comps[1:]:
        result &= comp
    return result

def apply_conditions(df, conditions):
    assert len(conditions) > 0
    comps = [df[c] == v for c, v in conditions]
    return reduce(lambda c1, c2: c1 & c2, comps[1:], comps[0])

df[apply_conditions(df, conditions)]

answered Dec 05 '22 07:12

Michael Hoff

Related questions
                            
                                Scikit Learn - Calculating TF-IDF from a corpus of arrays of features instead of from a corpus of raw documents
                            
                                Import error no module named zlib (brew installed python)
                            
                                Python. How to get the x,y coordinates of a offset spline from a x,y list of points and offset distance
                            
                                Django override bulk_create
                            
                                Python: Assertion error, "not called"
                            
                                OpenCV's waitKey() alternative in IPython Notebook
                            
                                Psycopg2 - AttributeError: 'NoneType' object has no attribute 'fetchall'
                            
                                Querying Pandas DataFrame with column name that contains a space or using the drop method with a column name that contains a space
                            
                                An elegant way to make a 2d array with all possible columns
                            
                                how do I commit and push to github from python shell?
                            
                                In python, can you pass variadic arguments after named parameters?
                            
                                Preserve empty lines with NLTK's Punkt Tokenizer
                            
                                python pandas dataframe : removing selected rows
                            
                                Remove rotation effect when drawing a square grid of MxM nodes in networkx using grid_2d_graph
                            
                                How to get extended MacOS attributes of a file using python?
                            
                                Increase tkSimpleDialog window size
                            
                                Pandas dataframe apply refer to previous row to calculate difference
                            
                                django 1.8- if form entry query result does't match database, display alert message on same page, instead of "None" or raise exception page
                            
                                Why NLTK lemmatization has wrong output even if verb.exc has added right value?
                            
                                Efficient pairwise correlation for two matrices of features

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With