Java Regex to get the text from HTML anchor (<a>...</a>) tags

Tags:

java

regex

I'm trying to get a text within a certain tag. So if I have:

<a href="http://something.com">Found<a/>

I want to be able to retrieve the Found text.

I'm trying to do it using regex. I am able to do it if the <a href="http://something.com> stays the same but it doesn't.

So far I have this:

Pattern titleFinder = Pattern.compile( ".*[a-zA-Z0-9 ]* ([a-zA-Z0-9 ]*)</a>.*" );

I think the last two parts - the ([a-zA-Z0-9 ]*)</a>.* - are ok but I don't know what to do for the first part.

967

asked Jan 07 '11 18:01

BeginnerPro

1 Answers

As they said, don't use regex to parse HTML. If you are aware of the shortcomings, you might get away with it, though. Try

Pattern titleFinder = Pattern.compile("<a[^>]*>(.*?)</a>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher regexMatcher = titleFinder.matcher(subjectString);
while (regexMatcher.find()) {
    // matched text: regexMatcher.group(1)
}

will iterate over all matches in a string.

It won't handle nested <a> tags and ignores all the attributes inside the tag.

answered Sep 28 '22 02:09

Tim Pietzcker

Related questions
                            
                                What is the best programming language for operationalizing research questions with large data sets? [closed]
                            
                                Determining what to unit test and what not to
                            
                                How to extract images from pdf using Java (not using pdfbox)
                            
                                Refresh adding an extra parameter
                            
                                Dozer Mapping HashMap<Key,Value> to List<Value>
                            
                                Is there a Java web framework that can reliably produce XHTML 1.0 Strict?
                            
                                How does Google App Engine User Service work internally?
                            
                                Change side of a jScrollPane & change size
                            
                                What is "inner class emulation" in Java?
                            
                                why does the method invoke() in InvocationHandler have an parameter Object proxy?
                            
                                Java Swing - How to sound beep before any JOptionPane?
                            
                                What is the difference between log4j and java.util.logging
                            
                                Deriving a secret from a master key using JCE/JCA
                            
                                Load a page of data at a time
                            
                                Call Java method from PHP5
                            
                                How to find Cursor position in a JTextArea
                            
                                What is the difference between java and jsp?
                            
                                Implement dictionary using Java
                            
                                NDK do not find the standard C++ libraries
                            
                                JPA ManyToMany ConcurrentModificationException issues

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With