I'm trying to use headless Chrome and Puppeteer to run our Javascript tests, but I can't extract the results from the page. Based on this answer, it looks like I should use page.evaluate()
. That section even has an example that looks like what I need.
const bodyHandle = await page.$('body');
const html = await page.evaluate(body => body.innerHTML, bodyHandle);
await bodyHandle.dispose();
As a full example, I tried to convert that to a script that will extract my name from my user profile on Stack Overflow. Our project is using Node 6, so I converted the await
expressions to use .then()
.
const puppeteer = require('puppeteer');
puppeteer.launch().then(function(browser) {
browser.newPage().then(function(page) {
page.goto('https://stackoverflow.com/users/4794').then(function() {
page.$('h2.user-card-name').then(function(heading_handle) {
page.evaluate(function(heading) {
return heading.innerText;
}, heading_handle).then(function(result) {
console.info(result);
browser.close();
}, function(error) {
console.error(error);
browser.close();
});
});
});
});
});
When I run that, I get this error:
$ node get_user.js
TypeError: Converting circular structure to JSON
at Object.stringify (native)
at args.map.x (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/helper.js:30:43)
at Array.map (native)
at Function.evaluationString (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/helper.js:30:29)
at Frame.<anonymous> (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:376:31)
at next (native)
at step (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:355:24)
at Promise (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:373:12)
at fn (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:351:10)
at Frame._rawEvaluate (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:375:3)
The problem seems to be with serializing the input parameter to page.evaluate()
. I can pass in strings and numbers, but not element handles. Is the example wrong, or is it a problem with Node 6? How can I extract the text of a DOM node?
Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.
We can get element text in Puppeteer. This is done with the help of the textContent property. This property of the element is passed as a parameter to the getProperty method.
To use Puppeteer with a different version of Chrome or Chromium, pass in the executable's path when creating a Browser instance: const browser = await puppeteer. launch({ executablePath: '/path/to/Chrome' }); You can also use Puppeteer with Firefox Nightly (experimental support).
I found three solutions to this problem, depending on how complicated your extraction is. The simplest option is a related function that I hadn't noticed: page.$eval()
. It basically does what I was trying to do: combines page.$()
and page.evaluate()
. Here's an example that works:
const puppeteer = require('puppeteer');
puppeteer.launch().then(function(browser) {
browser.newPage().then(function(page) {
page.goto('https://stackoverflow.com/users/4794').then(function() {
page.$eval('h2.user-card-name', function(heading) {
return heading.innerText;
}).then(function(result) {
console.info(result);
browser.close();
});
});
});
});
That gives me the expected result:
$ node get_user.js
Don Kirkby top 2% overall
I wanted to extract something more complicated, but I finally realized that the evaluation function is running in the context of the page. That means you can use any tools that are loaded in the page, and then just send strings and numbers back and forth. In this example, I use jQuery in a string to extract what I want:
const puppeteer = require('puppeteer');
puppeteer.launch().then(function(browser) {
browser.newPage().then(function(page) {
page.goto('https://stackoverflow.com/users/4794').then(function() {
page.evaluate("$('h2.user-card-name').text()").then(function(result) {
console.info(result);
browser.close();
});
});
});
});
That gives me a result with the whitespace intact:
$ node get_user.js
Don Kirkby
top 2% overall
In my real script, I want to extract the text of several nodes, so I need a function instead of a simple string:
const puppeteer = require('puppeteer');
puppeteer.launch().then(function(browser) {
browser.newPage().then(function(page) {
page.goto('https://stackoverflow.com/users/4794').then(function() {
page.evaluate(function() {
return $('h2.user-card-name').text();
}).then(function(result) {
console.info(result);
browser.close();
});
});
});
});
That gives the exact same result. Now I need to add error handling, and maybe reduce the indentation levels.
Using await/async
and $eval
, the syntax looks like the following:
await page.goto('https://stackoverflow.com/users/4794')
const nameElement = await context.page.$eval('h2.user-card-name', el => el.text())
console.log(nameElement)
I use page.$eval
const text = await page.$eval('h2.user-card-name', el => el.innerText );
console.log(text);
I had success using the following:
const browser = await puppeteer.launch();
try {
const page = await browser.newPage();
await page.goto(url);
await page.waitFor(2000);
let html_content = await page.evaluate(el => el.innerHTML, await page.$('.element-class-name'));
console.log(html_content);
} catch (err) {
console.log(err);
}
Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With