I'm trying to use regex to find phone numbers in the form (xxx) xxx-xxxx that are all inside a text document with messy html.
The text file has lines like:
<div style="font-weight:bold;">
<div>
<strong>Main Phone:
<span style="font-weight:normal;">(713) 555-9539
<strong>Main Fax:
<span style="font-weight:normal;">(713) 555-9541
<strong>Toll Free:
<span style="font-weight:normal;">(888) 555-9539
and my code contains:
Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");
Matcher m = p.matcher(line); //from buffered reader, reading 1 line at a time
if (m.matches()) {
stringArray.add(line);
}
The problem is when I put even simple things into the pattern to compile, it still returns nothing. And if it doesn't even recognize something like \d, how am I going to get a telephone number? For example:
Pattern p = Pattern.compile("\\d+"); //Returns nothing
Pattern p = Pattern.compile("\\d"); //Returns nothing
Pattern p = Pattern.compile("\\s+"); //Returns lines
Pattern p = Pattern.compile("\\D"); //Returns lines
This is really confusing to me, and any help would be appreciated.
Use Matcher#find() instead of matches() which would try to match the complete line as a phone number. find() would search and return true for sub-string matches as well.
Matcher m = p.matcher(line);
Also, the line above suggests you're creating the same Pattern and Matcher again in your loop. That's not efficient. Move the Pattern outside your loop and reset and reuse the same Matcher over different lines.
Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");
Matcher m = null;
String line = reader.readLine();
if (line != null && (m = p.matcher(line)).find()) {
stringArray.add(line);
}
while ((line = reader.readLine()) != null) {
m.reset(line);
if (m.find()) {
stringArray.add(line);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With