Regular Expression for finding phone numbers [duplicate]

Question

Possible Duplicates:
A comprehensive regex for phone number validation
grep with regex for phone number

Hello Everyone,

I am new to Stackoverflow and I have a quick question. Let's assume we are given a large number of HTML files (large as in theoretically infinite). How can I use Regular Expressions to extract the list of Phone Numbers from all those files?

Explanation/expression will be really appreciated. The Phone numbers can be any of the following formats:

(123) 456 7899
(123).456.7899
(123)-456-7899
123-456-7899
123 456 7899
1234567899

Thanks a lot for all your help and have a good one!

Mitch Dempsey · Accepted Answer

/^[\.-)( ]*([0-9]{3})[\.-)( ]*([0-9]{3})[\.-)( ]*([0-9]{4})$/

Should accomplish what you are trying to do.

The first part ^ means the "start of the line" which will force it to account for the whole string.

The [\.-)( ]* that I have in there mean "any period, hyphen, parenthesis, or space appearing 0 or more times".

The ([0-9]{3}) clusters match a group of 3 numbers (the last one is set to match 4)

Hope that helps!

Trey Hunner · Answer

Without knowing what language you're using I am unsure whether or not the syntax is correct.

This should match all of your groups with very few false positives:

/$?([0-9]{3})$?([ .-]?)([0-9]{3})\2([0-9]{4})/

The groups you will be interested in after the match are groups 1, 3, and 4. Group 2 exists only to make sure the first and second separator characters , ., or - are the same.

For example a sed command to strip the characters and leave phone numbers in the form 123456789:

sed "s/(\{0,1\}$[0-9]\{3\}$)\{0,1\}$[ .-]\{0,1\}$$[0-9]\{3\}$\2$[0-9]\{4\}$/\1\3\4/"

Here are the false positives of my expression:

(123)456789
(123456789
(123 456 789
(123.456.789
(123-456-789
123)456789
123) 456 789
123).456.789
123)-456-789

Breaking up the expression into two parts, one that matches with parenthesis and one that does not will eliminate all of these false positives except for the first one:

/$([0-9]{3})$([ .-]?)([0-9]{3})\2([0-9]{4})|([0-9]{3})([ .-]?)([0-9]{3})\5([0-9]{4})/

Groups 1, 3, and 4 or 5, 7, and 8 would matter in this case.

Regular Expression for finding phone numbers [duplicate]

Tags:

regex

phone-number

Rocky

2 Answers

Mitch Dempsey

Trey Hunner

Recent Activity

Donate For Us

Regular Expression for finding phone numbers [duplicate]

Tags:

regex

phone-number

Rocky

2 Answers

Mitch Dempsey

Trey Hunner

Related questions

Recent Activity

Donate For Us