I'm trying to get a text within a certain tag. So if I have:
<a href="http://something.com">Found<a/>
I want to be able to retrieve the Found
text.
I'm trying to do it using regex. I am able to do it if the <a href="http://something.com>
stays the same but it doesn't.
So far I have this:
Pattern titleFinder = Pattern.compile( ".*[a-zA-Z0-9 ]* ([a-zA-Z0-9 ]*)</a>.*" );
I think the last two parts - the ([a-zA-Z0-9 ]*)</a>.*
- are ok but I don't know what to do for the first part.
An anchor is a piece of text which marks the beginning and/or the end of a hypertext link. The text between the opening tag and the closing tag is either the start or destination (or both) of a link. Attributes of the anchor tag are as follows. HREF. OPTIONAL.
In Java, "\b" is a back-space character (char 0x08 ), which when used in a regex will match a back-space literal.
HTML <a> Tag. The <a> tag (anchor tag) in HTML is used to create a hyperlink on the webpage. This hyperlink is used to link the webpage to other web pages or some section of the same web page. It's either used to provide an absolute reference or a relative reference as its “href” value.
\\s*,\\s* It says zero or more occurrence of whitespace characters, followed by a comma and then followed by zero or more occurrence of whitespace characters. These are called short hand expressions. You can find similar regex in this site: http://www.regular-expressions.info/shorthand.html.
As they said, don't use regex to parse HTML. If you are aware of the shortcomings, you might get away with it, though. Try
Pattern titleFinder = Pattern.compile("<a[^>]*>(.*?)</a>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher regexMatcher = titleFinder.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group(1)
}
will iterate over all matches in a string.
It won't handle nested <a>
tags and ignores all the attributes inside the tag.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With