Possible Duplicates:
A comprehensive regex for phone number validation
grep with regex for phone number
Hello Everyone,
I am new to Stackoverflow and I have a quick question. Let's assume we are given a large number of HTML files (large as in theoretically infinite). How can I use Regular Expressions to extract the list of Phone Numbers from all those files?
Explanation/expression will be really appreciated. The Phone numbers can be any of the following formats:
Thanks a lot for all your help and have a good one!
/^[\.-)( ]*([0-9]{3})[\.-)( ]*([0-9]{3})[\.-)( ]*([0-9]{4})$/
Should accomplish what you are trying to do.
The first part ^
means the "start of the line" which will force it to account for the whole string.
The [\.-)( ]*
that I have in there mean "any period, hyphen, parenthesis, or space appearing 0 or more times".
The ([0-9]{3})
clusters match a group of 3 numbers (the last one is set to match 4)
Hope that helps!
Without knowing what language you're using I am unsure whether or not the syntax is correct.
This should match all of your groups with very few false positives:
/\(?([0-9]{3})\)?([ .-]?)([0-9]{3})\2([0-9]{4})/
The groups you will be interested in after the match are groups 1, 3, and 4. Group 2 exists only to make sure the first and second separator characters ,
.
, or -
are the same.
For example a sed command to strip the characters and leave phone numbers in the form 123456789:
sed "s/(\{0,1\}\([0-9]\{3\}\))\{0,1\}\([ .-]\{0,1\}\)\([0-9]\{3\}\)\2\([0-9]\{4\}\)/\1\3\4/"
Here are the false positives of my expression:
Breaking up the expression into two parts, one that matches with parenthesis and one that does not will eliminate all of these false positives except for the first one:
/\(([0-9]{3})\)([ .-]?)([0-9]{3})\2([0-9]{4})|([0-9]{3})([ .-]?)([0-9]{3})\5([0-9]{4})/
Groups 1, 3, and 4 or 5, 7, and 8 would matter in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With