REGEX filter with Pandas (any numeric combination followed by 'plus' sign)

Question

I have a Pandas dataframe called df with the following 3 columns: id, creation_date and email.

I want to return all rows where the email column contains any strictly numeric combination (must be strictly numbers) followed by a 'plus' sign and then followed by anything.

For example:
- 1345677+@gmail.com, 2345678+556@gmail.com will meet my criteria.
- Testing+22@gmail.com and test223+22@gmail.com will not, because they contain non-numeric characters before the 'plus' sign.

I know df.email.str.contains('\+') won't work because it will return everything that contains a 'plus' sign. I had tried df.filter(['email'], regex=r'([^0-9])' % '\+', axis=0) but it threw an error message that read TypeError: not all arguments converted during string formatting.

Can anyone advise?

Thanks very much!

andrew_reece · Accepted Answer

You can use contains, but match should be sufficient:

# example data
data = ["1345677+@gmail.com", "2345678+556@gmail.com", 
        "Testing+22@gmail.com", "test223+22@gmail.com"]
df = pd.DataFrame(data, columns=["email"])

df
                   email
0     1345677+@gmail.com
1  2345678+556@gmail.com
2   Testing+22@gmail.com
3   test223+22@gmail.com

Now use match:

df.email.str.match("\d+\+.*")

0     True
1     True
2    False
3    False
Name: email, dtype: bool

Note the difference between contains and match, from the docs:

contains
analogous, but less strict, relying on re.search instead of re.match

McClAnalytics · Answer

Try this:

df.email.str.contains('^\d+\+\@')

In breaking down the regular expression:

^ ensures that we are starting at the beginning of the email string

\d+ captures only digit (numeric) character 1 to many times

\+ escapes the plus sign to ensure a match

\@ escapes the @ and ensures that the plus sign previously matched occurs at the end of the email just prior to the @

REGEX filter with Pandas (any numeric combination followed by 'plus' sign)

Tags:

python

regex

pandas

Stanleyrr

2 Answers

andrew_reece

McClAnalytics

Recent Activity

Donate For Us

REGEX filter with Pandas (any numeric combination followed by 'plus' sign)

Tags:

python

regex

pandas

Stanleyrr

2 Answers

andrew_reece

McClAnalytics

Related questions

Recent Activity

Donate For Us