Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex in java question, multiple matches

Tags:

java

regex

I am trying to match multiple CSS style code blocks in a HTML document. This code will match the first one but won't match the second. What code would I need to match the second. Can I just get a list of the groups that are inside of my 'style' brackets? Should I call the 'find' method to get the next match?

Here is my regex pattern

^.*(<style type="text/css">)(.*)(</style>).*$

Usage:

final Pattern pattern_css = Pattern.compile(css_pattern_buf.toString(), 
                    Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);

 final Matcher match_css = pattern_css.matcher(text);
        if (match_css.matches() && (match_css.groupCount() >= 3)) {
            System.out.println("Woot ==>" + match_css.groupCount());
            System.out.println(match_css.group(2));
        } else {
            System.out.println("No Match");
        }
like image 707
Berlin Brown Avatar asked Dec 09 '22 22:12

Berlin Brown


1 Answers

I am trying to match multiple CSS style code blocks in a HTML document.

Standard Answer: don't use regex to parse HTML. regex cannot parse HTML reliably, no matter how complicated and clever you make your expression. Unless you are absolutely sure the exact format of the target document is totally fixed, string or regex processing is insufficient and you must use an HTML parser.

(<style type="text/css">)(.*)(</style>)

That's a greedy expression. The (.*) in the middle will match as much as it possibly can. If you have two style blocks:

<style type="text/css">1</style> <style type="text/css">2</style>

then it will happily match '1</style> <style type="text/css">2'.

Use (.*?) to get a non-greedy expression, which will allow the trailing (</style>) to match at the first opportunity.

Should I call the 'find' method to get the next match?

Yes, and you should have used it to get the first match too. The usual idiom is:

while (matcher.find()) {
    s= matcher.group(n);
}

Note that standard string processing (indexOf, etc) may be a simpler approach for you than regex, since you're only using completely fixed strings. However, the Standard Answer still applies.

like image 132
bobince Avatar answered Dec 25 '22 14:12

bobince