I'm trying to get the title tag of a url with cheerio. But, I'm getting empty string values. This is my code:
app.get('/scrape', function(req, res){
url = 'http://nrabinowitz.github.io/pjscrape/';
request(url, function(error, response, html){
if(!error){
var $ = cheerio.load(html);
var title, release, rating;
var json = { title : "", release : "", rating : ""};
$('title').filter(function(){
//var data = $(this);
var data = $(this);
title = data.children().first().text();
release = data.children().last().children().text();
json.title = title;
json.release = release;
})
$('.star-box-giga-star').filter(function(){
var data = $(this);
rating = data.text();
json.rating = rating;
})
}
fs.writeFile('output.json', JSON.stringify(json, null, 4), function(err){
console.log('File successfully written! - Check your project directory for the output.json file');
})
// Finally, we'll just send out a message to the browser reminding you that this app does not have a UI.
res.send('Check your console!')
})
});
Attributes can be retrieved with attr function. import fetch from 'node-fetch'; import { load } from 'cheerio'; const url = 'http://webcode.me'; const response = await fetch(url); const body = await response. text(); let $ = load(body); let lnEl = $('link'); let attrs = lnEl. attr(); console.
Cheerio can parse nearly any HTML or XML document.
Cheerio js is a Javascript technology used for web-scraping in server-side implementations. Web-scraping is a scripted method of extracting data from a website that can be tailored to your use-case. NodeJS is often used as the server-side platform.
We install cheerio, request, and local-web-server . Inside the project directory, where we have the index.html file, we start the local web server. It automatically serves the index.html file on three different locations. In the first example, we get the title of the document. The example prints the title of the HTML document.
If you want to use cheerio for scraping a web page, you need to first fetch the markup using packages like axios or node-fetch among others. In this section, you will learn how to scrape a web page using cheerio.
For making HTTP requests to get data from the web page we will use the Got library, and for parsing through the HTML we'll use Cheerio. Cheerio implements a subset of core jQuery, making it a familiar tool to use for lots of JavaScript developers. Let's dive into how to use it.
Cheerio provides the .each method for looping through several selected elements. Below, we are selecting all the li elements and looping through them using the .each method. We log the text content of each list item on the terminal. Add the code below to your app.js file.
request(url, function (error, response, body)
{
if (!error && response.statusCode == 200)
{
var $ = cheerio.load(body);
var title = $("title").text();
}
})
Using Javascript we extract the text contained within the "title" tags.
If Robert Ryan's solution still doesn't work, I'd be suspicious of the formatting of the original page, which may be malformed somehow.
In my case I was accepting gzip and other compression but never decoding, so Cheerio was trying to parse compressed binary bits. When console logging the original body, I was able to spot the binary text instead of plain text HTML.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With