I need to get only the text content from a HTML String with a space or a line break separating the text content of different elements.
For example, the HTML String might be:
<ul>
<li>First</li>
<li>Second</li>
</ul>
What I want:
First Second
or
First
Second
I've tried to get the text content by first wrapping the entire string inside a div and then getting the textContent using third party libraries. But, there is no spacing or line breaks between text content of different elements which I specifically require (i.e. I get FirstSecond which is not what I want).
The only solution I am thinking of right now is to make a DOM Tree and then apply recursion to get the nodes that contain text, and then append the text of that element to a string with spaces. Are there any cleaner, neater, and simpler solution than this?
In your terminal, install the html-to-text npm package:
npm install html-to-text
Then in JavaScript::
const { convert } = require('html-to-text'); // Import the library
var htmlString = `
<ul>
<li>First</li>
<li>Second</li>
</ul>
`;
var text = convert(htmlString, { wordwrap: 130 })
// Out:
// First
// Second
You can try get rid of html tags using regex, for the yours example try the following:
let str = `<ul>
<li>First</li>
<li>Second</li>
</ul>`
console.log(str)
let regex = '<\/?!?(li|ul)[^>]*>'
var re = new RegExp(regex, 'g');
str = str.replace(re, '');
console.log(str)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With