I am having problems trying to use the regular expression that I used in JavaScript. On a web page, you may have:
<b>Renewal Date:</b> 03 May 2010</td>
I just want to be able to pull out the 03 May 2010, remembering that a webpage has more than just the above content. The way I currently perform this using JavaScript is:
DateStr = /<b>Renewal Date:<\/b>(.+?)<\/td>/.exec(returnedHTMLPage);
I tried to follow some tutorials on java.util.regex.Pattern and java.util.regex.Matcher with no luck. I can't seem to be able to translate (.+?) into something they can understand??
thanks,
Noeneel
This is how regular expressions are used in Java:
Pattern p = Pattern.compile("<b>Renewal Date:</b>(.+?)</td>");
Matcher m = p.matcher(returnedHTMLPage);
if (m.find()) // find the next match (and "generate the groups")
System.out.println(m.group(1)); // prints whatever the .+? expression matched.
There are other useful methods in the Matcher class, such as m.matches(). Have a look at Matcher.
matches vs findThe problem is that you used matches when you should've used find. From the API:
- The
matchesmethod attempts to match the entire input sequence against the pattern.- The
findmethod scans the input sequence looking for the next subsequence that matches the pattern.
Note that String.matches(String regex) also looks for a full match of the entire string. Unfortunately String does not provide a partial regex match, but you can always s.matches(".*pattern.*") instead.
Java understands (.+?) perfectly.
Here's a demonstration: you're given a string s that consists of a string t repeating at least twice. Find t.
System.out.println("hahahaha".replaceAll("^(.+)\\1+$", "($1)"));
// prints "(haha)" -- greedy takes longest possible
System.out.println("hahahaha".replaceAll("^(.+?)\\1+$", "($1)"));
// prints "(ha)" -- reluctant takes shortest possible
It should also be said that you have injected \ into your regex ("\\" as Java string literal) unnecessarily.
String regexDate = "<b>Expiry Date:<\\/b>(.+?)<\\/td>";
^^ ^^
Pattern p2 = Pattern.compile("<b>Expiry Date:<\\/b>");
^^
\ is used to escape regex metacharacters. A / is NOT a regex metacharacter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With