I have a pandas dataframe, which has a list of user id's 'subscriber_id' and some other info.
I want to only select subscribers not in a given list A.
So if our data frame contains info for subscribers [1,2,3,4,5] and my exclude list is [2,4,5], I should now get a dataframe with information for [1,3]
I have tried using a mask as follows:
temp = df.mask(lambda x: x['subscriber_id'] not in subscribers)
but no luck!
I am sure the not in
is valid Python syntax, as I tested it on a list as follows:
c = [1,2,3,4,5]
if 5 not in c:
print 'YAY'
>> YAY
Any suggestion or alternative way to filter the dataframe?
You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.
You could use the isin
method:
In [30]: df = pd.DataFrame({'subscriber_id':[1,2,3,4,5]})
In [31]: df
Out[31]:
subscriber_id
0 1
1 2
2 3
3 4
4 5
[5 rows x 1 columns]
In [32]: mask = df['subscriber_id'].isin([2,4,5])
In [33]: mask
Out[33]:
0 False
1 True
2 False
3 True
4 True
Name: subscriber_id, dtype: bool
In [34]: df.loc[~mask]
Out[34]:
subscriber_id
0 1
2 3
[2 rows x 1 columns]
If you use df.mask
, then the input must be a boolean NDFrame or an array. lambda x: x['subscriber_id'] not in subscribers
is a function, which is why it raised an exception.
Here is one way you could use df.mask
, again with isin
to form the boolean condition:
In [43]: df['subscriber_id'].mask(df['subscriber_id'].isin([2,4,5]).values)
Out[43]:
0 1
1 NaN
2 3
3 NaN
4 NaN
Name: subscriber_id, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With