Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would I write a Java regex that gets the contents of a <script> tag?

I'm trying to integrate analytics into my GWT application. To do this, I'm calling a service that returns a String of HTML that needs to be parsed and eval'ed.

I need a regex that looks for and grabs either 1) the body of the tag or 2) the contents of the "src" attribute. I want to eval both of these with JavaScript. I'm happy with assuming that if a "src" attribute exists, the body can be ignored.

Thanks,

Matt

like image 547
Matt Raible Avatar asked Jan 23 '23 20:01

Matt Raible


2 Answers

Must it be a regex? You can use the DOM to obtain such information, here is a trivial example of getting the contents of the BODY tag, you could apply it to whatever you like:

function test(){
    var body = document.getElementsByTagName("body")[0];
    alert(body.innerHTML);
}
like image 78
t3rse Avatar answered Jan 29 '23 10:01

t3rse


This seems to do what you want:

    final String srcOne = "<html>\r\n<head>\r\n<script src=\"http://test.com/some.js\"/>\r\n</head></html>";
    final String srcTwo = "<html>\r\n<head>\r\n<script src=\"http://test.com/some.js\"></script>\r\n</head></html>";
    final String tag = "<html>\r\n<head>\r\n<script>\r\nfunction() {\r\n\talert('hi');\r\n}\r\n</script>\r\n</head></html>";
    final String tagAndSrc = "<html>\r\n<head>\r\n<script src=\"http://test.com/some.js\">\r\nfunction() {\r\n\talert('hi');\r\n}\r\n</script>\r\n</head></html>";
    final String[] tests = new String[] {srcOne, srcTwo, tag, tagAndSrc, srcOne + srcTwo, tag + srcOne + tagAndSrc};

    final String regex = "<script(?:[^>]*src=['\"]([^'\"]*)['\"][^>]*>|[^>]*>([^<]*)</script>)";
    final Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
    for (int testNumber = 0; testNumber < tests.length; ++testNumber) {
        final String test = tests[testNumber];
        final Matcher matcher = pattern.matcher(test);
        System.out.println("--------------------------------");
        System.out.println("TEST " + testNumber + ": " + test);
        while (matcher.find()) {
            System.out.println("GROUP 1: " + matcher.group(1));
            System.out.println("GROUP 2: " + matcher.group(2));
        }
        System.out.println("--------------------------------");
        System.out.println();
    }

That being said, you would probably be better off using something like Tag Soup if it is at all possible.

like image 30
laz Avatar answered Jan 29 '23 12:01

laz