Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove DataFrame rows where a column's values are in a set?

I have a set

remove_set

I want to remove all rows in a dataframe where a column value is in that set.

df = df[df.column_in_set not in remove_set]

This gives me the error:

'Series' objects are mutable, thus they cannot be hashed. 

What is the most pandas/pythonic way to solve this problem? I could iterate through the rows and figure out the the ilocs to exclude, but that seems a little inelegant.

Some sample input and expected output.

Input:

 column_in_set value_2 value_3
 1             'a'      3
 2             'b'      4
 3             'c'      5
 4             'd'      6

remove = set([2,4])

Output:

column_in_set value_2 value_3
1             'a'      3
3             'c'      5
like image 821
Andrew Avatar asked Aug 03 '15 17:08

Andrew


1 Answers

To make the selection you can write:

df[~df['column_in_set'].isin(remove)]

isin() simply checks if each value of the column/Series is in a set (or list or other iterable), returning a boolean Series.

In this case, we want to only include rows of the DataFrame which are not in remove so we invert the boolean values with ~ and use then this to index the DataFrame.

like image 138
Alex Riley Avatar answered Nov 03 '22 20:11

Alex Riley