Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to parse phone numbers in text document with java

Tags:

java

html

regex

I'm trying to use regex to find phone numbers in the form (xxx) xxx-xxxx that are all inside a text document with messy html.

The text file has lines like:

  <div style="font-weight:bold;">
  <div>
   <strong>Main Phone:
   <span style="font-weight:normal;">(713) 555-9539&nbsp;&nbsp;&nbsp;&nbsp;
   <strong>Main Fax:
   <span style="font-weight:normal;">(713) 555-9541&nbsp;&nbsp;&nbsp;&nbsp;
   <strong>Toll Free:
   <span style="font-weight:normal;">(888) 555-9539

and my code contains:

Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");
Matcher m = p.matcher(line); //from buffered reader, reading 1 line at a time

if (m.matches()) {
     stringArray.add(line);
}

The problem is when I put even simple things into the pattern to compile, it still returns nothing. And if it doesn't even recognize something like \d, how am I going to get a telephone number? For example:

Pattern p = Pattern.compile("\\d+"); //Returns nothing
Pattern p = Pattern.compile("\\d");  //Returns nothing
Pattern p = Pattern.compile("\\s+"); //Returns lines
Pattern p = Pattern.compile("\\D");  //Returns lines

This is really confusing to me, and any help would be appreciated.

like image 825
James Phillips Avatar asked Jun 06 '26 12:06

James Phillips


1 Answers

Use Matcher#find() instead of matches() which would try to match the complete line as a phone number. find() would search and return true for sub-string matches as well.

Matcher m = p.matcher(line);

Also, the line above suggests you're creating the same Pattern and Matcher again in your loop. That's not efficient. Move the Pattern outside your loop and reset and reuse the same Matcher over different lines.

Pattern p = Pattern.compile("\\(\\d{3}\\)\\s\\d{3}-\\d{4}");

Matcher m = null;
String line = reader.readLine();
if (line != null && (m = p.matcher(line)).find()) {
    stringArray.add(line);
}

while ((line = reader.readLine()) != null) {
  m.reset(line);
  if (m.find()) {
    stringArray.add(line);
  }
}
like image 147
Ravi K Thapliyal Avatar answered Jun 08 '26 02:06

Ravi K Thapliyal