I need to filter a data frame with a dict, constructed with the key being the column name and the value being the value that I want to filter:
filter_v = {'A':1, 'B':0, 'C':'This is right'} # this would be the normal approach df[(df['A'] == 1) & (df['B'] ==0)& (df['C'] == 'This is right')]
But I want to do something on the lines
for column, value in filter_v.items(): df[df[column] == value]
but this will filter the data frame several times, one value at a time, and not apply all filters at the same time. Is there a way to do it programmatically?
EDIT: an example:
df1 = pd.DataFrame({'A':[1,0,1,1, np.nan], 'B':[1,1,1,0,1], 'C':['right','right','wrong','right', 'right'],'D':[1,2,2,3,4]}) filter_v = {'A':1, 'B':0, 'C':'right'} df1.loc[df1[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]
gives
A B C D 0 1 1 right 1 1 0 1 right 2 3 1 0 right 3
but the expected result was
A B C D 3 1 0 right 3
only the last one should be selected.
Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.
You can convert dictionary to pandas dataframe by creating a list of Dictionary items using the list(my_dict. items()) . Also, you can pass the column header values using the columns paramter. When the values of the Dictionary keys are not a list of values.
IIUC, you should be able to do something like this:
>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)] A B C D 3 1 0 right 3
This works by making a Series to compare against:
>>> pd.Series(filter_v) A 1 B 0 C right dtype: object
Selecting the corresponding part of df1
:
>>> df1[list(filter_v)] A C B 0 1 right 1 1 0 right 1 2 1 wrong 1 3 1 right 0 4 NaN right 1
Finding where they match:
>>> df1[list(filter_v)] == pd.Series(filter_v) A B C 0 True False True 1 False False True 2 True False False 3 True True True 4 False False True
Finding where they all match:
>>> (df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1) 0 False 1 False 2 False 3 True 4 False dtype: bool
And finally using this to index into df1:
>>> df1.loc[(df1[list(filter_v)] == pd.Series(filter_v)).all(axis=1)] A B C D 3 1 0 right 3
Here is a way to do it:
df.loc[df[filter_v.keys()].isin(filter_v.values()).all(axis=1), :]
UPDATE:
With values being the same across columns you could then do something like this:
# Create your filtering function: def filter_dict(df, dic): return df[df[dic.keys()].apply( lambda x: x.equals(pd.Series(dic.values(), index=x.index, name=x.name)), asix=1)] # Use it on your DataFrame: filter_dict(df1, filter_v)
Which yields:
A B C D 3 1 0 right 3
If it something that you do frequently you could go as far as to patch DataFrame for an easy access to this filter:
pd.DataFrame.filter_dict_ = filter_dict
And then use this filter like this:
df1.filter_dict_(filter_v)
Which would yield the same result.
BUT, it is not the right way to do it, clearly. I would use DSM's approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With