Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression for finding phone numbers [duplicate]

Possible Duplicates:
A comprehensive regex for phone number validation
grep with regex for phone number

Hello Everyone,

I am new to Stackoverflow and I have a quick question. Let's assume we are given a large number of HTML files (large as in theoretically infinite). How can I use Regular Expressions to extract the list of Phone Numbers from all those files?

Explanation/expression will be really appreciated. The Phone numbers can be any of the following formats:

  • (123) 456 7899
  • (123).456.7899
  • (123)-456-7899
  • 123-456-7899
  • 123 456 7899
  • 1234567899

Thanks a lot for all your help and have a good one!

like image 942
Rocky Avatar asked May 16 '10 01:05

Rocky


2 Answers

/^[\.-)( ]*([0-9]{3})[\.-)( ]*([0-9]{3})[\.-)( ]*([0-9]{4})$/

Should accomplish what you are trying to do.

The first part ^ means the "start of the line" which will force it to account for the whole string.

The [\.-)( ]* that I have in there mean "any period, hyphen, parenthesis, or space appearing 0 or more times".

The ([0-9]{3}) clusters match a group of 3 numbers (the last one is set to match 4)

Hope that helps!

like image 86
Mitch Dempsey Avatar answered Sep 25 '22 02:09

Mitch Dempsey


Without knowing what language you're using I am unsure whether or not the syntax is correct.

This should match all of your groups with very few false positives:

/\(?([0-9]{3})\)?([ .-]?)([0-9]{3})\2([0-9]{4})/

The groups you will be interested in after the match are groups 1, 3, and 4. Group 2 exists only to make sure the first and second separator characters , ., or - are the same.

For example a sed command to strip the characters and leave phone numbers in the form 123456789:

sed "s/(\{0,1\}\([0-9]\{3\}\))\{0,1\}\([ .-]\{0,1\}\)\([0-9]\{3\}\)\2\([0-9]\{4\}\)/\1\3\4/"

Here are the false positives of my expression:

  • (123)456789
  • (123456789
  • (123 456 789
  • (123.456.789
  • (123-456-789
  • 123)456789
  • 123) 456 789
  • 123).456.789
  • 123)-456-789

Breaking up the expression into two parts, one that matches with parenthesis and one that does not will eliminate all of these false positives except for the first one:

/\(([0-9]{3})\)([ .-]?)([0-9]{3})\2([0-9]{4})|([0-9]{3})([ .-]?)([0-9]{3})\5([0-9]{4})/

Groups 1, 3, and 4 or 5, 7, and 8 would matter in this case.

like image 41
Trey Hunner Avatar answered Sep 22 '22 02:09

Trey Hunner