Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automate the Boring Stuff Chapter 7: Regular Expressions - phone number and email extractor only extracting phone numbers

Tags:

python

I am following the book and am pretty sure I copied the code verbatim. When I copy the Contact Us page on the publisher website (nostarch.com/ContactUs) and run it through the program, it outputs all the phone numbers but no email addresses.

I made sure the code was copied correctly. I thought it may be an issue with the print function so I tried pasting the result into a text file and the email addresses were still nowhere to be found.

import pyperclip, re

# email regex
emailRegex = re.compile(r'''(
    [a-zA-Z0-9._%+-]+  # username
    @                  # at symbol
    [a-zA-Z0-9.-]+     # domain name
    (\.[a-zA-Z]{2-4})  #dot-something
    )''', re.VERBOSE)

# find matches in clipboard text
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
    phoneNum = '-'.join([groups[1], groups[3], groups[5]])
    if groups[8] != '':
        phoneNum += ' x' + groups[8]
    matches.append(phoneNum)
for groups in emailRegex.findall(text):
    matches.append(groups[0])

# copy results to the clipboard
if len(matches) > 0:
    pyperclip.copy('\n'.join(matches))
    print ('Copied to clipboard:')
    print ('\n'.join(matches))
else:
    print('No phone numbers or email addresses found.')

I expect to get the result:

Copied to clipboard:
800-420-7240
415-863-9900
415-863-9950
[email protected]
[email protected]
[email protected]
[email protected]

but only got this:

Copied to clipboard:
800-420-7240
415-863-9900
415-863-9950
like image 695
xHoudek Avatar asked Jan 25 '19 03:01

xHoudek


1 Answers

I made sure the code was copied correctly - nope. You should replace {2-4} with {2,4} to look for 2 to 4 characters according to both RegEx syntax and chapter 7 text.

You may consider to use https://regex101.com/ to try your regular expressions online and see regex full explanation.

like image 144
Poolka Avatar answered Sep 26 '22 16:09

Poolka