As you know I can do df[df.column.isin(set)]
to get the part of the DataFrame where the column value is in that set. But now my source set is dependent on the value of column1. How do I make the function lookup a dictionary for the source set on the go, as it filters the dataframe?
let's say i have
dict1={'a':[1,2,3],'b':[1,2],'c':[4,5,6]}
and my df looks like
column1 column2
a 4
b 2
c 6
and the new df should look like
column1 column2
b 2
c 6
Since 4 isn't in the list that 'a' points to in dict1.
new_df = df[df.column2.isin(dict1[df.column1])]
This gives me TypeError. I understand since i'm using pd.Series as keys and it's not hashable.
The problem is that isin
expect a sequence of values, and not a Series of sequences. Said differently it allows vectorization on keys but not on values.
So you have to use a non vectorized way here, for example:
df[df.apply(lambda x: x['column2'] in dict1[x['column1']], axis=1)]
You could do with a list comprehension and pandas.concat
. In the comprehension, use boolean indexing
with logical AND (&
) operator:
df_new = pd.concat([df[df['column1'].eq(k) & df['column2'].isin(v)] for k, v in dict1.items()])
[out]
column1 column2
1 b 2
2 c 6
Another approach would be to restructure your dict
as a DataFrame
and merge
:
df_dict = pd.DataFrame([(k, i) for k, v in dict1.items() for i in v], columns=['column1', 'column2'])
df.merge(df_dict, how='inner', on=['column1', 'column2'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With