Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

REGEX filter with Pandas (any numeric combination followed by 'plus' sign)

I have a Pandas dataframe called df with the following 3 columns: id, creation_date and email.

I want to return all rows where the email column contains any strictly numeric combination (must be strictly numbers) followed by a 'plus' sign and then followed by anything.

For example:
- [email protected], [email protected] will meet my criteria.
- [email protected] and [email protected] will not, because they contain non-numeric characters before the 'plus' sign.

I know df.email.str.contains('\+') won't work because it will return everything that contains a 'plus' sign. I had tried df.filter(['email'], regex=r'([^0-9])' % '\+', axis=0) but it threw an error message that read TypeError: not all arguments converted during string formatting.

Can anyone advise?

Thanks very much!

like image 741
Stanleyrr Avatar asked Jan 13 '18 04:01

Stanleyrr


2 Answers

You can use contains, but match should be sufficient:

# example data
data = ["[email protected]", "[email protected]", 
        "[email protected]", "[email protected]"]
df = pd.DataFrame(data, columns=["email"])

df
                   email
0     [email protected]
1  [email protected]
2   [email protected]
3   [email protected]

Now use match:

df.email.str.match("\d+\+.*")

0     True
1     True
2    False
3    False
Name: email, dtype: bool

Note the difference between contains and match, from the docs:

contains
analogous, but less strict, relying on re.search instead of re.match

like image 154
andrew_reece Avatar answered Sep 20 '22 15:09

andrew_reece


Try this:

df.email.str.contains('^\d+\+\@')

In breaking down the regular expression:

^ ensures that we are starting at the beginning of the email string

\d+ captures only digit (numeric) character 1 to many times

\+ escapes the plus sign to ensure a match

\@ escapes the @ and ensures that the plus sign previously matched occurs at the end of the email just prior to the @

like image 23
McClAnalytics Avatar answered Sep 21 '22 15:09

McClAnalytics