If you know exactly how you want to filter a dataframe, the solution is trivial:
df[(df.A == 1) & (df.B == 1)]
But what if you are accepting user input and do not know beforehand how many criteria the user wants to use? For example, the user wants a filtered data frame where columns [A, B, C] == 1. Is it possible to do something like:
def filterIt(*args, value):
return df[(df.*args == value)]
so if the user calls filterIt(A, B, C, value=1)
, it returns:
df[(df.A == 1) & (df.B == 1) & (df.C == 1)]
I think the most elegant way to do this is using df.query()
, where you can build up a string with all your conditions, e.g.:
import pandas as pd
import numpy as np
cols = {}
for col in ('A', 'B', 'C', 'D', 'E'):
cols[col] = np.random.randint(1, 5, 20)
df = pd.DataFrame(cols)
def filter_df(df, filter_cols, value):
conditions = []
for col in filter_cols:
conditions.append('{c} == {v}'.format(c=col, v=value))
query_expr = ' and '.join(conditions)
print('querying with: {q}'.format(q=query_expr))
return df.query(query_expr)
Example output (your results may differ due to the randomly generated data):
filter_df(df, ['A', 'B'], 1)
querying with: A == 1 and B == 1
A B C D E
6 1 1 1 2 1
11 1 1 2 3 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With