Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter a pandas dataframe using values from a dict

Tags:

python

pandas

I need to filter a data frame with a dict, constructed with the key being the column name and the value being the value that I want to filter:

filter_v = {'A':1, 'B':0, 'C':'This is right'} # this would be the normal approach df[(df['A'] == 1) & (df['B'] ==0)& (df['C'] == 'This is right')] 

But I want to do something on the lines

for column, value in filter_v.items():     df[df[column] == value] 

but this will filter the data frame several times, one value at a time, and not apply all filters at the same time. Is there a way to do it programmatically?

EDIT: an example:

df1 = pd.DataFrame({'A':[1,0,1,1, np.nan], 'B':[1,1,1,0,1], 'C':['right','right','wrong','right', 'right'],'D':[1,2,2,3,4]}) filter_v = {'A':1, 'B':0, 'C':'right'} df1.loc[df1[filter_v.keys()].isin(filter_v.values()).all(axis=1), :] 

gives

    A   B   C   D 0   1   1   right   1 1   0   1   right   2 3   1   0   right   3 

but the expected result was

    A   B   C   D 3   1   0   right   3 

only the last one should be selected.

like image 590
Ivan Avatar asked Dec 08 '15 13:12

Ivan


People also ask

How do you filter DataFrame for certain values?

Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.

Can you put a dictionary in a pandas DataFrame?

You can convert dictionary to pandas dataframe by creating a list of Dictionary items using the list(my_dict. items()) . Also, you can pass the column header values using the columns paramter. When the values of the Dictionary keys are not a list of values.


2 Answers

IIUC, you should be able to do something like this:

>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)]    A  B      C  D 3  1  0  right  3 

This works by making a Series to compare against:

>>> pd.Series(filter_v) A        1 B        0 C    right dtype: object 

Selecting the corresponding part of df1:

>>> df1[list(filter_v)]     A      C  B 0   1  right  1 1   0  right  1 2   1  wrong  1 3   1  right  0 4 NaN  right  1 

Finding where they match:

>>> df1[list(filter_v)] == pd.Series(filter_v)        A      B      C 0   True  False   True 1  False  False   True 2   True  False  False 3   True   True   True 4  False  False   True 

Finding where they all match:

>>> (df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1) 0    False 1    False 2    False 3     True 4    False dtype: bool 

And finally using this to index into df1:

>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)]    A  B      C  D 3  1  0  right  3 
like image 56
DSM Avatar answered Oct 09 '22 08:10

DSM


Here is a way to do it:

df.loc[df[filter_v.keys()].isin(filter_v.values()).all(axis=1), :] 

UPDATE:

With values being the same across columns you could then do something like this:

# Create your filtering function:  def filter_dict(df, dic):     return df[df[dic.keys()].apply(             lambda x: x.equals(pd.Series(dic.values(), index=x.index, name=x.name)), asix=1)]  # Use it on your DataFrame:  filter_dict(df1, filter_v) 

Which yields:

   A  B      C  D 3  1  0  right  3             

If it something that you do frequently you could go as far as to patch DataFrame for an easy access to this filter:

pd.DataFrame.filter_dict_ = filter_dict 

And then use this filter like this:

df1.filter_dict_(filter_v) 

Which would yield the same result.

BUT, it is not the right way to do it, clearly. I would use DSM's approach.

like image 41
Primer Avatar answered Oct 09 '22 08:10

Primer