Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Javascript Regex, Removing unclosed tags

I'm looking for javascript regex solution to remove unclosed tags for example:

<div></div><span>

As you can see i want to remove the <span> element, I know it's a bad idea to use regex on markup but it's required for my project, This is the regex pattern i made but it didn't work:

/<([a-z]+?)>([\s\S]*?)(?!<\/\1>)/g

I'm using javascript replace to replace all matches with "", What i try with my pattern is to match only unclosed tags, About the pattern:

  1. [a-z] i know html tags can contain =,",etc, I'm looking for simple pattern that i can play and edit so i started with [a-z]
  2. I used !? to reject matches for closing tags.

I know my pattern isn't working, If anyone have an idea i will be very thankful.

Edit:

I'm aware that there may be recursion, If this is the case i want to remove all the recursion tree, I only want to keep 1 level of html for example:

<div><span></span></div><p></p>

So if the next tag after the <div> is not </div> remove it.

like image 337
Aviel Fedida Avatar asked Feb 25 '26 06:02

Aviel Fedida


1 Answers

First of all, lets see what OP said:

  • I know it's a bad idea to use regex on markup but it's required for my project.
  • I only want to keep 1 level of html

This can be achieved.

You were on the right track. However you shouldn't have used !? to reject matches for closing tags. You want to accept them. This way the match will not accept unclosed tags which is our goal after all.

Now, your regex will look like this.

/<([a-z]+?)>([\s\S]*?)(<\/\1>)/g

We can remove the second and third brackets as they are not necessary:

/<([a-z]+?)>[\s\S]*?<\/\1>/g

If we test this regex on the provided code will will get the following:

"<div><span></span></div><p></p>".match(/<([a-z]+?)>[\s\S]*?<\/\1>/g)
["<div><span></span></div>", "<p></p>"]

It seems that our regex matches TOO MUCH symbols. We must break the match at the "<" symbol as it denotes new tag. The [^<] means "any character but "<".

"<div><span></span></div><p></p>".match(/<([a-z]+?)>[^<]*?<\/\1>/g)
["<span></span>", "<p></p>"]

Finally we can just join the matched results.

"<div><span></span></div><p></p>".match(/<([a-z]+?)>[^<]*?<\/\1>/g).join("")
"<span></span><p></p>"

Wohoooo. I will leave the first part of regex to you as it was not part of the question. I hope this was helpful. I am open for further questions.

like image 112
tsikov Avatar answered Feb 26 '26 19:02

tsikov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!