I want to strip tags from a html, but preserves it's line breaks.
I want the behaviour like copying the text in browser and pasting it in notepad.
For example, a code that converts:
<div>x1</div><div>x2</div>
to x1\nx2
<p>x1</p><p>x2</p>
to x1\nx2
<b>x1</b><i>x2</i>
to x1x2
x1<br>x2
to x1\nx2
Removing all tags not works (/<.*?>/g).
Also creating a dummy <div> and settings it's innertHTML
and read it's textContent
will remove line breaks.
Any Help?
The <br> HTML element produces a line break in text (carriage-return).
The newline character is \n in JavaScript and many other languages. All you need to do is add \n character whenever you require a line break to add a new line to a string.
The RegEx is used with the replace() method to replace all the line breaks in string with <br>. The pattern /(\r\n|\r|\n)/ checks for line breaks. The pattern /g checks across all the string occurrences.
stripHtml( html ) Changes the provided HTML string into a plain text string by converting <br> , <p> , and <div> to line breaks, stripping all other tags, and converting escaped characters into their display values.
How's this work for you? This will replace every occurrence of <br>
, </div>
, and </p>
with a \n
, and then strip the remaining tags. Its goofy, but its at least a start.
fixed = text_to_fix.replace(/<(?:br|\/div|\/p)>/g, "\n")
.replace(/<.*?>/g, "");
This doesn't work for all HTML, however. Just the tags you mentioned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With