Using the HTML below, I'm attempting to extract the html of each paragraph. However I cannot find any way to turn the nodes back into HTML or query objects.
The below is a string var html = ...
<article>
<p> p1 </p>
<p> p2 </p>
</article>
The html is loaded as such
var $ = require('cheerio').load(html)
var paragraphs = $('p').toArray().map(p => /* I want the html at this point */ )
How to I get the HTML of these paragraphs?
NOTE: for clarity I'm calling the return value of cheerio.load
a "query object" and the return of the toArray
method DOM nodes; for lack of a better phrase.
According to the W3C HTML DOM standard, everything in an HTML document is a node: The entire document is a document node. Every HTML element is an element node.
Cheerio is not a web browser It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript which is common for a SPA (single page application).
Cheerio js is a Javascript technology used for web-scraping in server-side implementations. Web-scraping is a scripted method of extracting data from a website that can be tailored to your use-case. NodeJS is often used as the server-side platform.
You can use $.html
:
var paragraphs = $('p').toArray().map(p => {
console.log($.html(p));
return $.html(p);
});
The documentation shows an example using a selector, however cheerio DOM elements also work as expected:
If you want to return the outerHTML you can use $.html(selector):
$.html('.pear') //=> <li class="pear">Pear</li>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With