Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering dataframes in pandas : use a list of conditions

Tags:

python

pandas

I have a pandas dataframe with two dimensions : 'col1' and 'col2'

I can filter certain values of those two columns using :

df[ (df["col1"]=='foo') & (df["col2"]=='bar')]

Is there any way I can filter both columns at once ?

I tried naively to use the restriction of the dataframes to two columns, but my best guesses for the second part of the equality don't work :

df[df[["col1","col2"]]==['foo','bar']]

yields me this error

ValueError: Invalid broadcasting comparison [['foo', 'bar']] with block values

I need to do this because the names of the columns, but also the number of columns on which the condition will be set will vary

like image 600
WNG Avatar asked Nov 13 '15 18:11

WNG


People also ask

How do you filter data with pandas using conditions?

Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.

How do you filter a DataFrame in multiple conditions?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.

Is conditional filtering method is possible on pandas?

Pandas enable us to filter the DataFrame by selecting the rows based on one or more conditions. The filtering can be as simple as running a query based on a single condition and can also be a complex query that takes multiple conditions into consideration.


2 Answers

To the best of my knowledge, there is no way in Pandas for you to do what you want. However, although the following solution may not me the most pretty, you can zip a set of parallel lists as follows:

cols = ['col1', 'col2']
conditions = ['foo', 'bar']

df[eval(" & ".join(["(df['{0}'] == '{1}')".format(col, cond) 
   for col, cond in zip(cols, conditions)]))]

The string join results in the following:

>>> " & ".join(["(df['{0}'] == '{1}')".format(col, cond) 
    for col, cond in zip(cols, conditions)])

"(df['col1'] == 'foo') & (df['col2'] == 'bar')"

Which you then use eval to evaluate, effectively:

df[eval("(df['col1'] == 'foo') & (df['col2'] == 'bar')")]

For example:

df = pd.DataFrame({'col1': ['foo', 'bar, 'baz'], 'col2': ['bar', 'spam', 'ham']})

>>> df
  col1  col2
0  foo   bar
1  bar  spam
2  baz   ham

>>> df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
            for col, cond in zip(cols, conditions)]))]
  col1 col2
0  foo  bar
like image 149
Alexander Avatar answered Dec 05 '22 07:12

Alexander


I would like to point out an alternative for the accepted answer as eval is not necessary for solving this problem.

from functools import reduce

df = pd.DataFrame({'col1': ['foo', 'bar', 'baz'], 'col2': ['bar', 'spam', 'ham']})
cols = ['col1', 'col2']
values = ['foo', 'bar']
conditions = zip(cols, values)

def apply_conditions(df, conditions):
    assert len(conditions) > 0
    comps = [df[c] == v for c, v in conditions]
    result = comps[0]
    for comp in comps[1:]:
        result &= comp
    return result

def apply_conditions(df, conditions):
    assert len(conditions) > 0
    comps = [df[c] == v for c, v in conditions]
    return reduce(lambda c1, c2: c1 & c2, comps[1:], comps[0])

df[apply_conditions(df, conditions)]
like image 32
Michael Hoff Avatar answered Dec 05 '22 07:12

Michael Hoff