Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find phone numbers in python script

Tags:

python

regex

the following python script allows me to scrape email addresses from a given file using regular expressions.

How could I add to this so that I can also get phone numbers? Say, if it was either the 7 digit or 10 digit (with area code), and also account for parenthesis?

My current script can be found below:

# filename variables filename = 'file.txt' newfilename = 'result.txt'  # read the file if os.path.exists(filename):         data = open(filename,'r')         bulkemails = data.read() else:         print "File not found."         raise SystemExit  # regex = [email protected] r = re.compile(r'(\b[\w.]+@+[\w.]+.+[\w.]\b)') results = r.findall(bulkemails) emails = "" for x in results:         emails += str(x)+"\n"  # function to write file def writefile():         f = open(newfilename, 'w')         f.write(emails)         f.close()         print "File written." 

Regex for phone numbers:

(\d{3}[-\.\s]\d{3}[-\.\s]\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]\d{4}|\d{3}[-\.\s]\d{4}) 

Another regex for phone numbers:

(?:(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})(?:\s*(?:#|x\.?|ext\.?|extension)\s*(\d+))? 
like image 403
Aaron Avatar asked Oct 06 '10 00:10

Aaron


People also ask

How do you write phone numbers in Python?

Our Python Script First, insert one or more phone numbers you wish to dial into the DIAL_NUMBERS list. Each one should be a string, separated by a comma. For example, DIAL_NUMBERS = ["+12025551234", "+14155559876", "+19735551234"] .

What is phonenumbers library in Python?

Phonenumbers is an open-source Python library that is used for accessing information for phone numbers. It also helps in validating a phone number, parsing a phone number, etc.


1 Answers

If you are interested in learning Regex, you could take a stab at writing it yourself. It's not quite as hard as it's made out to be. Sites like RegexPal allow you to enter some test data, then write and test a Regular Expression against that data. Using RegexPal, try adding some phone numbers in the various formats you expect to find them (with brackets, area codes, etc), grab a Regex cheatsheet and see how far you can get. If nothing else, it will help in reading other peoples Expressions.

Edit: Here is a modified version of your Regex, which should also match 7 and 10-digit phone numbers that lack any hyphens, spaces or dots. I added question marks after the character classes (the []s), which makes anything within them optional. I tested it in RegexPal, but as I'm still learning Regex, I'm not sure that it's perfect. Give it a try.

(\d{3}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4}) 

It matched the following values in RegexPal:

000-000-0000 000 000 0000 000.000.0000  (000)000-0000 (000)000 0000 (000)000.0000 (000) 000-0000 (000) 000 0000 (000) 000.0000  000-0000 000 0000 000.0000  0000000 0000000000 (000)0000000 
like image 168
Auguste Avatar answered Sep 27 '22 21:09

Auguste