Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove empty tags using RegEx

I want to delete empty tags such as <label></label>, <font> </font> so that:

<label></label><form></form>
<p>This is <span style="color: red;">red</span> 
<i>italic</i>
</p>

will be cleaned as:

<p>This is <span style="color: red;">red</span> 
<i>italic</i>
</p>

I have this RegEx in javascript, but it deletes the the empty tags but it also delete this: "<i>italic</i></p>"

str=str.replace(/<[\S]+><\/[\S]+>/gim, "");

What I am missing?

like image 371
bobby Avatar asked Jun 28 '10 02:06

bobby


1 Answers

You have "not spaces" as your character class, which means "<i>italic</i></p>" will match. The first half of your regex will match "<(i>italic</i)>" and the second half "</(p)>". (I've used brackets to show what each [\S]+ matches.)

Change this:

/<[\S]+><\/[\S]+>/

To this:

/<[^/>][^>]*><\/[^>]+>/

Overall you should really be using a proper HTML processor, but if you're munging HTML soup this should suffice :)

like image 113
porges Avatar answered Oct 19 '22 04:10

porges