Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to turn Cheerio DOM nodes back into html?

Using the HTML below, I'm attempting to extract the html of each paragraph. However I cannot find any way to turn the nodes back into HTML or query objects.

The below is a string var html = ...

<article>
    <p> p1 </p>
    <p> p2 </p>
</article>

The html is loaded as such

var $ = require('cheerio').load(html)
var paragraphs = $('p').toArray().map(p => /* I want the html at this point */ )

How to I get the HTML of these paragraphs?

NOTE: for clarity I'm calling the return value of cheerio.load a "query object" and the return of the toArray method DOM nodes; for lack of a better phrase.

like image 429
Aage Torleif Avatar asked Jun 11 '16 18:06

Aage Torleif


People also ask

Are all nodes in the DOM are HTML elements?

According to the W3C HTML DOM standard, everything in an HTML document is a node: The entire document is a document node. Every HTML element is an element node.

Can Cheerio run in browser?

Cheerio is not a web browser It does not interpret the result as a web browser does. Specifically, it does not produce a visual rendering, apply CSS, load external resources, or execute JavaScript which is common for a SPA (single page application).

What is Cheeriojs?

Cheerio js is a Javascript technology used for web-scraping in server-side implementations. Web-scraping is a scripted method of extracting data from a website that can be tailored to your use-case. NodeJS is often used as the server-side platform.


1 Answers

You can use $.html:

var paragraphs = $('p').toArray().map(p => {
    console.log($.html(p));
    return $.html(p);
});

The documentation shows an example using a selector, however cheerio DOM elements also work as expected:

If you want to return the outerHTML you can use $.html(selector):

$.html('.pear') //=> <li class="pear">Pear</li>

like image 117
lucasjackson Avatar answered Sep 21 '22 14:09

lucasjackson