Filter a pandas dataframe using values from a dict

Tags:

python

pandas

I need to filter a data frame with a dict, constructed with the key being the column name and the value being the value that I want to filter:

filter_v = {'A':1, 'B':0, 'C':'This is right'} # this would be the normal approach df[(df['A'] == 1) & (df['B'] ==0)& (df['C'] == 'This is right')]

But I want to do something on the lines

for column, value in filter_v.items():     df[df[column] == value]

but this will filter the data frame several times, one value at a time, and not apply all filters at the same time. Is there a way to do it programmatically?

EDIT: an example:

df1 = pd.DataFrame({'A':[1,0,1,1, np.nan], 'B':[1,1,1,0,1], 'C':['right','right','wrong','right', 'right'],'D':[1,2,2,3,4]}) filter_v = {'A':1, 'B':0, 'C':'right'} df1.loc[df1[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]

gives

    A   B   C   D 0   1   1   right   1 1   0   1   right   2 3   1   0   right   3

but the expected result was

    A   B   C   D 3   1   0   right   3

only the last one should be selected.

590

asked Dec 08 '15 13:12

Ivan

2 Answers

IIUC, you should be able to do something like this:

>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)]    A  B      C  D 3  1  0  right  3

This works by making a Series to compare against:

>>> pd.Series(filter_v) A        1 B        0 C    right dtype: object

Selecting the corresponding part of df1:

>>> df1[list(filter_v)]     A      C  B 0   1  right  1 1   0  right  1 2   1  wrong  1 3   1  right  0 4 NaN  right  1

Finding where they match:

>>> df1[list(filter_v)] == pd.Series(filter_v)        A      B      C 0   True  False   True 1  False  False   True 2   True  False  False 3   True   True   True 4  False  False   True

Finding where they all match:

>>> (df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1) 0    False 1    False 2    False 3     True 4    False dtype: bool

And finally using this to index into df1:

>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)]    A  B      C  D 3  1  0  right  3

answered Oct 09 '22 08:10

DSM

Here is a way to do it:

df.loc[df[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]

UPDATE:

With values being the same across columns you could then do something like this:

# Create your filtering function:  def filter_dict(df, dic):     return df[df[dic.keys()].apply(             lambda x: x.equals(pd.Series(dic.values(), index=x.index, name=x.name)), asix=1)]  # Use it on your DataFrame:  filter_dict(df1, filter_v)

Which yields:

   A  B      C  D 3  1  0  right  3

If it something that you do frequently you could go as far as to patch DataFrame for an easy access to this filter:

pd.DataFrame.filter_dict_ = filter_dict

And then use this filter like this:

df1.filter_dict_(filter_v)

Which would yield the same result.

BUT, it is not the right way to do it, clearly. I would use DSM's approach.

answered Oct 09 '22 08:10

Primer

Related questions
                            
                                Why does pip freeze report some packages in a fresh virtualenv created with --no-site-packages?
                            
                                Can you perform multi-threaded tasks within Django?
                            
                                How do I transpose dataframe in pandas without index?
                            
                                What does the "yield from" syntax do in asyncio and how is it different from "await"
                            
                                Tab completion in Python's raw_input()
                            
                                Big-O of list slicing
                            
                                What does Django's @property do?
                            
                                Simplest way of checking for string that contains a string in list? [duplicate]
                            
                                Cross-correlation (time-lag-correlation) with pandas?
                            
                                How to apply "first" and "last" functions to columns while using group by in pandas?
                            
                                Python pytz timezone function returns a timezone that is off by 9 minutes
                            
                                Can Cython compile to an EXE?
                            
                                Python how to read N number of lines at a time
                            
                                Process very large (>20GB) text file line by line
                            
                                How to fill specific positional arguments with partial in python?
                            
                                StringIO and compatibility with 'with' statement (context manager)
                            
                                Django, filter by specified month and year in date range
                            
                                set multi index of an existing data frame in pandas
                            
                                How to append to bytes in python 3
                            
                                pylint 1.4 reports E1101(no-member) on all C extensions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With