Given a dataframe full of emails, I want to filter out rows containing potentially blocked domain names or clearly fake emails. The dataframe below represents an example of my data.
>> print(df)
email number
1 [email protected] 2
2 [email protected] 1
3 [email protected] 5
4 [email protected] 2
5 [email protected] 1
I want to filter by two lists. The first list is fake_lst = ['noemail', 'noaddress', 'fake', ... 'no.email'].
The second list is just the set from disposable_email_domains import blocklist converted to a list (or kept as a set).
When I use df = df[~df['email'].str.contains('noemail')] it works fine and filters out that entry. Yet when I do df = df[~df['email'].str.contains(fake_lst)] I get TypeError: unhashable type: 'list'.
The obvious answer is to use df = df[~df['email'].isin(fake_lst)] as in many other stackoverflow questions, like Filter Pandas Dataframe based on List of substrings or pandas filtering using isin function but that ends up having no effect.
I suppose I could use str.contains('string') for each possible list entry, but that is ridiculously cumbersome.
Therefore, I need to filter this dataframe based on the substrings contained in the two lists such that any email containing a particular substring I want rid of, and the subsequent row in which it is contained, are removed.
In the example above, the dataframe after filtering would be:
>> print(df)
email number
2 [email protected] 1
4 [email protected] 2
5 [email protected] 1
Use DataFrame.isin to check whether each element in the DataFrame is contained in values. Another issue is that your fake list contains the name without the domain so you need str.split to remove the characters you are not matching against.
Note: str.contains tests if a pattern or regex is contained within a string of a Series and hence your code df['email'].str.contains('noemail') works fine but doesn't work for list
df[~df['email'].str.split('@').str[0].isin(fake_lst)]
email number
0 [email protected] 2
1 [email protected] 1
3 [email protected] 2
4 [email protected] 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With