Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get regex to match multiple script tags?

I'm trying to return the contents of any tags in a body of text. I'm currently using the following expression, but it only captures the contents of the first tag and ignores any others after that.

Here's a sample of the html:

    <script type="text/javascript">
        alert('1');
    </script>

    <div>Test</div>

    <script type="text/javascript">
        alert('2');
    </script>

My regex looks like this:

//scripttext contains the sample
re = /<script\b[^>]*>([\s\S]*?)<\/script>/gm;
var scripts  = re.exec(scripttext);

When I run this on IE6, it returns 2 matches. The first containing the full tag, the 2nd containing alert('1').

When I run it on http://www.pagecolumn.com/tool/regtest.htm it gives me 2 results, each containing the script tags only.

like image 827
Geuis Avatar asked Sep 17 '09 21:09

Geuis


People also ask

Can you have multiple script tags?

Multiple <SCRIPT> Tags Up to this point all of the JavaScript Code was in one <SCRIPT> tag, this does not need to be the case. You can have as many <SCRIPT></SCRIPT> tags as you would like in a document. The <SCRIPT> tags are processed as they are encountered.

How will get all the matching tags in a HTML file?

If you want to find all HTML elements that match a specified CSS selector (id, class names, types, attributes, values of attributes, etc), use the querySelectorAll() method. This example returns a list of all <p> elements with class="intro" .

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

How do I match a regex pattern?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .


1 Answers

The "problem" here is in how exec works. It matches only first occurrence, but stores current index (i.e. caret position) in lastIndex property of a regex. To get all matches simply apply regex to the string until it fails to match (this is a pretty common way to do it):

var scripttext = ' <script type="text/javascript">\nalert(\'1\');\n</script>\n\n<div>Test</div>\n\n<script type="text/javascript">\nalert(\'2\');\n</script>';

var re = /<script\b[^>]*>([\s\S]*?)<\/script>/gm;

var match;
while (match = re.exec(scripttext)) {
  // full match is in match[0], whereas captured groups are in ...[1], ...[2], etc.
  console.log(match[1]);
}
like image 109
kangax Avatar answered Sep 28 '22 00:09

kangax