Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: Is it possible to filter a dataframe with arbitrarily long boolean criteria?

Tags:

python

pandas

If you know exactly how you want to filter a dataframe, the solution is trivial:

df[(df.A == 1) & (df.B == 1)]

But what if you are accepting user input and do not know beforehand how many criteria the user wants to use? For example, the user wants a filtered data frame where columns [A, B, C] == 1. Is it possible to do something like:

def filterIt(*args, value):
    return df[(df.*args == value)]

so if the user calls filterIt(A, B, C, value=1), it returns:

df[(df.A == 1) & (df.B == 1) & (df.C == 1)]
like image 744
yobogoya Avatar asked Feb 09 '16 22:02

yobogoya


1 Answers

I think the most elegant way to do this is using df.query(), where you can build up a string with all your conditions, e.g.:

import pandas as pd
import numpy as np

cols = {}
for col in ('A', 'B', 'C', 'D', 'E'):
    cols[col] = np.random.randint(1, 5, 20)
df = pd.DataFrame(cols)

def filter_df(df, filter_cols, value):
    conditions = []
    for col in filter_cols:
        conditions.append('{c} == {v}'.format(c=col, v=value))
    query_expr = ' and '.join(conditions)
    print('querying with: {q}'.format(q=query_expr))
    return df.query(query_expr)

Example output (your results may differ due to the randomly generated data):

filter_df(df, ['A', 'B'], 1)
querying with: A == 1 and B == 1
    A  B  C  D  E
6   1  1  1  2  1
11  1  1  2  3  4
like image 191
Marius Avatar answered Oct 28 '22 01:10

Marius