I'm trying to remove all html tags except p
, a
and img
tags. Right now I have:
content.replace(/(<([^>]+)>)/ig,"");
But this removes all HTML tags.
This are examples of the content of the api:
<table id="content_LETTER.BLOCK9" border="0" width="100%" cellspacing="0" cellpadding="0" bgcolor="#F7EBF5">
<tbody><tr><td class="ArticlePadding" colspan="1" rowspan="1" align="left" valign="top"><div>what is the opposite of...[] rest of text
You may match the tags to keep in a capture group and then, using alternation, all other tags. Then replace with $1
:
(<\/?(?:a|p|img)[^>]*>)|<[^>]+>
Demo: https://regex101.com/r/Sm4Azv/2
And the JavaScript demo:
var input = 'b<body>b a<a>a h1<h1>h1 p<p>p p</p>p img<img />img';
var output = input.replace(/(<\/?(?:a|p|img)[^>]*>)|<[^>]+>/ig, '$1');
console.log(output);
You can use the below regex to remove all HTML tags except a
, p
and img
:
<\/?(?!a)(?!p)(?!img)\w*\b[^>]*>
Replace with an empty string.
var text = '<tr><p><img src="url" /> some text <img another></img><div><a>blablabla</a></div></p></tr>';
var output = text.replace(/<\/?(?!a)(?!p)(?!img)\w*\b[^>]*>/ig, '');
console.log(output);
Regex 101 Demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With