Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use series.isin with different sets for different values?

Tags:

python

pandas

As you know I can do df[df.column.isin(set)] to get the part of the DataFrame where the column value is in that set. But now my source set is dependent on the value of column1. How do I make the function lookup a dictionary for the source set on the go, as it filters the dataframe?

let's say i have

dict1={'a':[1,2,3],'b':[1,2],'c':[4,5,6]}

and my df looks like

column1  column2
a        4
b        2
c        6

and the new df should look like

column1  column2
b        2
c        6

Since 4 isn't in the list that 'a' points to in dict1.

new_df = df[df.column2.isin(dict1[df.column1])] This gives me TypeError. I understand since i'm using pd.Series as keys and it's not hashable.

like image 587
Xuekai Du Avatar asked Jun 12 '19 09:06

Xuekai Du


2 Answers

The problem is that isin expect a sequence of values, and not a Series of sequences. Said differently it allows vectorization on keys but not on values.

So you have to use a non vectorized way here, for example:

df[df.apply(lambda x: x['column2'] in dict1[x['column1']], axis=1)]
like image 97
Serge Ballesta Avatar answered Nov 14 '22 23:11

Serge Ballesta


You could do with a list comprehension and pandas.concat. In the comprehension, use boolean indexing with logical AND (&) operator:

df_new = pd.concat([df[df['column1'].eq(k) & df['column2'].isin(v)] for k, v in dict1.items()])

[out]

  column1  column2
1       b        2
2       c        6

Another approach would be to restructure your dict as a DataFrame and merge:

df_dict = pd.DataFrame([(k, i) for k, v in dict1.items() for i in v], columns=['column1', 'column2'])

df.merge(df_dict, how='inner', on=['column1', 'column2'])
like image 30
Chris Adams Avatar answered Nov 15 '22 00:11

Chris Adams