I am scraping the body of the webpage:
axios.get(url)
.then(function(response){
var $ = cheerio.load(response.data);
var body = $('body').text();
});
The problem is, I want to exclude contents from the <footer>
tag. How do I do that?
cheerio creates a pseudo-DOM when it parses the HTML. You can manipulate that DOM similar to how you would manipulate the DOM in a browser. In your specific case, you could remove items from the DOM using any number of methods such as
.remove()
.replaceWith()
.empty()
.html()
So, the basic idea is that you would use a selector to find the footer element and then remove it as in:
$('footer').remove();
Then, fetch the text after you've removed those elements:
var body = $('body').text();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With