Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse contents of script tags inside string

Let's say I have the following string:

var myString = "<p>hello</p><script>console.log('hello')</script><h1>Test</h1><script>console.log('world')</script>"

I would like to use split to get an array with the contents of the script tags. e.g. I want my output to be:

["console.log('hello')", "console.log('world')"]

I tried doing myString.split(/[<script></script>]/) But did not get the expected output.

Any help is appreciated.

like image 978
i_trope Avatar asked May 04 '15 14:05

i_trope


People also ask

Can we use script tag inside body?

The <script> tag can be placed in the <head> section of your HTML or in the <body> section, depending on when you want the JavaScript to load.

What is written inside script tag?

The <script> tag is used to embed a client-side script (JavaScript). The <script> element either contains scripting statements, or it points to an external script file through the src attribute. Common uses for JavaScript are image manipulation, form validation, and dynamic changes of content.

What is parsing a string in JavaScript?

The JSON.parse() method parses a JSON string, constructing the JavaScript value or object described by the string. An optional reviver function can be provided to perform a transformation on the resulting object before it is returned.


1 Answers

You can't parse (X)HTML with regex.

Instead, you can parse it using innerHTML.

var element = document.createElement('div');
element.innerHTML = myString; // Parse HTML properly (but unsafely)

However, this is not safe. Even if innerHTML doesn't run the JS inside script elements, malicious strings can still run arbitrary JS, e.g. with <img src="//" onerror="alert()">.

To avoid that problem, you can use DOMImplementation.createHTMLDocument to create a new document, which can be used as a sandbox.

var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly

Alternatively, new browsers support DOMParser:

var doc = new DOMParser().parseFromString(myString, 'text/html');

Once the HTML string has been parsed to the DOM, you can use DOM methods like getElementsByTagName or querySelectorAll to get all the script elements.

var scriptElements = doc.getElementsByTagName('script');

Finally, [].map can be used to obtain an array with the textContent of each script element.

var arrayScriptContents = [].map.call(scriptElements, function(el) {
    return el.textContent;
});

The full code would be

var doc = document.implementation.createHTMLDocument(); // Sandbox
doc.body.innerHTML = myString; // Parse HTML properly
[].map.call(doc.getElementsByTagName('script'), function(el) {
    return el.textContent;
});
like image 161
Oriol Avatar answered Sep 27 '22 19:09

Oriol