Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cheerio how to ignore elements of a certain tag

I am scraping the body of the webpage:

axios.get(url)
.then(function(response){
        var $ = cheerio.load(response.data);
        var body = $('body').text();
    });

The problem is, I want to exclude contents from the <footer> tag. How do I do that?

like image 408
Dylan Czenski Avatar asked Sep 01 '25 00:09

Dylan Czenski


1 Answers

cheerio creates a pseudo-DOM when it parses the HTML. You can manipulate that DOM similar to how you would manipulate the DOM in a browser. In your specific case, you could remove items from the DOM using any number of methods such as

 .remove()
 .replaceWith()
 .empty()
 .html()

So, the basic idea is that you would use a selector to find the footer element and then remove it as in:

$('footer').remove();

Then, fetch the text after you've removed those elements:

var body = $('body').text();
like image 80
jfriend00 Avatar answered Sep 02 '25 14:09

jfriend00