Let's say I have the following:
$ = cheerio.load('<html><body><ul><li>One</li><li>Two</li></body></html>');
var t = $('html').find('*').contents().filter(function() {
return this.type === 'text';
}).text();
I get:
OneTwo
Instead of:
One Two
It's the same result I get if I do $('html').text()
. So basically what I need is to inject a separator like (space) or
\n
Notice: This is not a jQuery front-end question is more like NodeJS backend related issue with Cheerio and HTML parsing.
This seems to do the trick:
var t = $('html *').contents().map(function() {
return (this.type === 'text') ? $(this).text() : '';
}).get().join(' ');
console.log(t);
Result:
One Two
Just improved my solution a little bit:
var t = $('html *').contents().map(function() {
return (this.type === 'text') ? $(this).text()+' ' : '';
}).get().join('');
You can use the TextVersionJS package to generate the plain text version of an html string. You can use it on the browser and in node.js as well.
var createTextVersion = require("textversionjs");
var yourHtml = "<h1>Your HTML</h1><ul><li>goes</li><li>here.</li></ul>";
var textVersion = createTextVersion(yourHtml);
Download it from npm and require it with Browserify for example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With