I have a pandas dataframe with two dimensions : 'col1' and 'col2'
I can filter certain values of those two columns using :
df[ (df["col1"]=='foo') & (df["col2"]=='bar')]
Is there any way I can filter both columns at once ?
I tried naively to use the restriction of the dataframes to two columns, but my best guesses for the second part of the equality don't work :
df[df[["col1","col2"]]==['foo','bar']]
yields me this error
ValueError: Invalid broadcasting comparison [['foo', 'bar']] with block values
I need to do this because the names of the columns, but also the number of columns on which the condition will be set will vary
Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.
Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.
Pandas enable us to filter the DataFrame by selecting the rows based on one or more conditions. The filtering can be as simple as running a query based on a single condition and can also be a complex query that takes multiple conditions into consideration.
To the best of my knowledge, there is no way in Pandas for you to do what you want. However, although the following solution may not me the most pretty, you can zip a set of parallel lists as follows:
cols = ['col1', 'col2']
conditions = ['foo', 'bar']
df[eval(" & ".join(["(df['{0}'] == '{1}')".format(col, cond)
for col, cond in zip(cols, conditions)]))]
The string join results in the following:
>>> " & ".join(["(df['{0}'] == '{1}')".format(col, cond)
for col, cond in zip(cols, conditions)])
"(df['col1'] == 'foo') & (df['col2'] == 'bar')"
Which you then use eval
to evaluate, effectively:
df[eval("(df['col1'] == 'foo') & (df['col2'] == 'bar')")]
For example:
df = pd.DataFrame({'col1': ['foo', 'bar, 'baz'], 'col2': ['bar', 'spam', 'ham']})
>>> df
col1 col2
0 foo bar
1 bar spam
2 baz ham
>>> df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond))
for col, cond in zip(cols, conditions)]))]
col1 col2
0 foo bar
I would like to point out an alternative for the accepted answer as eval
is not necessary for solving this problem.
from functools import reduce
df = pd.DataFrame({'col1': ['foo', 'bar', 'baz'], 'col2': ['bar', 'spam', 'ham']})
cols = ['col1', 'col2']
values = ['foo', 'bar']
conditions = zip(cols, values)
def apply_conditions(df, conditions):
assert len(conditions) > 0
comps = [df[c] == v for c, v in conditions]
result = comps[0]
for comp in comps[1:]:
result &= comp
return result
def apply_conditions(df, conditions):
assert len(conditions) > 0
comps = [df[c] == v for c, v in conditions]
return reduce(lambda c1, c2: c1 & c2, comps[1:], comps[0])
df[apply_conditions(df, conditions)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With