Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to replace html tags using regex

For the example, I'm trying to replace

<script type='text/javascript'>some stuff</script>

with:

<div type='text/javascript'>some stuff</div>

I'm currently testing with:

alert( o.replace( /(?:<\s*\/?\s*)(script)(?:\s*([^>]*)?\s*>)/gi ,'div') );

But what I'm getting is:

divsomestuffdiv

How can I get this to only replace the "script" portion and preserve the other markup and attribute characters?

like image 630
Geuis Avatar asked May 11 '09 20:05

Geuis


People also ask

Can I use regex in replace?

The Regex. Replace(String, String, MatchEvaluator, RegexOptions) method is useful for replacing a regular expression match if any of the following conditions is true: If the replacement string cannot readily be specified by a regular expression replacement pattern.

Why you cant parse HTML with regex?

Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts.

Can I use regex in HTML?

While arbitrary HTML with only a regex is impossible, it's sometimes appropriate to use them for parsing a limited, known set of HTML. If you have a small set of HTML pages that you want to scrape data from and then stuff into a database, regexes might work fine.


2 Answers

You have keep the opening and closing tag brackets. So try this:

o.replace(/(<\s*\/?\s*)script(\s*([^>]*)?\s*>)/gi ,'$1div$2')
like image 110
Gumbo Avatar answered Sep 17 '22 17:09

Gumbo


A naive but readable way would be to do it in two passes i suppose and first match and replace the

<script

part with

<div

and then another which would match

</script>

and replace it with

</div>

like image 44
Simonw Avatar answered Sep 18 '22 17:09

Simonw