Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering pandas data frame by a list of id's

I have a pandas dataframe, which has a list of user id's 'subscriber_id' and some other info.

I want to only select subscribers not in a given list A.

So if our data frame contains info for subscribers [1,2,3,4,5] and my exclude list is [2,4,5], I should now get a dataframe with information for [1,3]

I have tried using a mask as follows:

temp = df.mask(lambda x: x['subscriber_id'] not in subscribers)

but no luck!

I am sure the not in is valid Python syntax, as I tested it on a list as follows:

c = [1,2,3,4,5]
if 5 not in c:
    print 'YAY'
>> YAY

Any suggestion or alternative way to filter the dataframe?

like image 804
redrubia Avatar asked May 19 '14 19:05

redrubia


People also ask

How do you filter records in a data frame?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.


1 Answers

You could use the isin method:

In [30]: df = pd.DataFrame({'subscriber_id':[1,2,3,4,5]})

In [31]: df
Out[31]: 
   subscriber_id
0              1
1              2
2              3
3              4
4              5

[5 rows x 1 columns]

In [32]: mask = df['subscriber_id'].isin([2,4,5])

In [33]: mask
Out[33]: 
0    False
1     True
2    False
3     True
4     True
Name: subscriber_id, dtype: bool

In [34]: df.loc[~mask]
Out[34]: 
   subscriber_id
0              1
2              3

[2 rows x 1 columns]

If you use df.mask, then the input must be a boolean NDFrame or an array. lambda x: x['subscriber_id'] not in subscribers is a function, which is why it raised an exception.

Here is one way you could use df.mask, again with isin to form the boolean condition:

In [43]: df['subscriber_id'].mask(df['subscriber_id'].isin([2,4,5]).values)
Out[43]: 
0     1
1   NaN
2     3
3   NaN
4   NaN
Name: subscriber_id, dtype: float64
like image 197
unutbu Avatar answered Sep 19 '22 06:09

unutbu