I'm trying to use headless Chrome and Puppeteer to run our Javascript tests, but I can't extract the results from the page. Based on this answer, it looks like I should use <code>page.evaluate()</code>. That section even has an example that looks like what I need. <pre class="prettyprint"><code>const bodyHandle = await page.$('body'); const html = await page.evaluate(body => body.innerHTML, bodyHandle); await bodyHandle.dispose(); </code></pre> As a full example, I tried to convert that to a script that will extract my name from my user profile on Stack Overflow. Our project is using Node 6, so I converted the <code>await</code> expressions to use <code>.then()</code>. <pre class="prettyprint"><code>const puppeteer = require('puppeteer'); puppeteer.launch().then(function(browser) { browser.newPage().then(function(page) { page.goto('https://stackoverflow.com/users/4794').then(function() { page.$('h2.user-card-name').then(function(heading_handle) { page.evaluate(function(heading) { return heading.innerText; }, heading_handle).then(function(result) { console.info(result); browser.close(); }, function(error) { console.error(error); browser.close(); }); }); }); }); }); </code></pre> When I run that, I get this error: <pre class="prettyprint"><code>$ node get_user.js TypeError: Converting circular structure to JSON at Object.stringify (native) at args.map.x (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/helper.js:30:43) at Array.map (native) at Function.evaluationString (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/helper.js:30:29) at Frame.<anonymous> (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:376:31) at next (native) at step (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:355:24) at Promise (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:373:12) at fn (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:351:10) at Frame._rawEvaluate (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:375:3) </code></pre> The problem seems to be with serializing the input parameter to <code>page.evaluate()</code>. I can pass in strings and numbers, but not element handles. Is the example wrong, or is it a problem with Node 6? How can I extract the text of a DOM node?

Using <code>await/async</code> and <code>$eval</code>, the syntax looks like the following: <pre class="prettyprint"><code>await page.goto('https://stackoverflow.com/users/4794') const nameElement = await context.page.$eval('h2.user-card-name', el => el.text()) console.log(nameElement) </code></pre>

I use page.$eval <pre class="prettyprint"><code>const text = await page.$eval('h2.user-card-name', el => el.innerText ); console.log(text); </code></pre>

Getting DOM node text with Puppeteer and headless Chrome

Tags:

node.js

puppeteer

google-chrome-headless

I'm trying to use headless Chrome and Puppeteer to run our Javascript tests, but I can't extract the results from the page. Based on this answer, it looks like I should use page.evaluate(). That section even has an example that looks like what I need.

const bodyHandle = await page.$('body');
const html = await page.evaluate(body => body.innerHTML, bodyHandle);
await bodyHandle.dispose();

As a full example, I tried to convert that to a script that will extract my name from my user profile on Stack Overflow. Our project is using Node 6, so I converted the await expressions to use .then().

const puppeteer = require('puppeteer');

puppeteer.launch().then(function(browser) {
    browser.newPage().then(function(page) {
        page.goto('https://stackoverflow.com/users/4794').then(function() {
            page.$('h2.user-card-name').then(function(heading_handle) {
                page.evaluate(function(heading) {
                    return heading.innerText;
                }, heading_handle).then(function(result) {
                    console.info(result);
                    browser.close();
                }, function(error) {
                    console.error(error);
                    browser.close();
                });
            });
        });
    });
});

When I run that, I get this error:

$ node get_user.js 
TypeError: Converting circular structure to JSON
    at Object.stringify (native)
    at args.map.x (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/helper.js:30:43)
    at Array.map (native)
    at Function.evaluationString (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/helper.js:30:29)
    at Frame.<anonymous> (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:376:31)
    at next (native)
    at step (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:355:24)
    at Promise (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:373:12)
    at fn (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:351:10)
    at Frame._rawEvaluate (/mnt/data/don/git/Kive/node_modules/puppeteer/node6/FrameManager.js:375:3)

The problem seems to be with serializing the input parameter to page.evaluate(). I can pass in strings and numbers, but not element handles. Is the example wrong, or is it a problem with Node 6? How can I extract the text of a DOM node?

684

asked Sep 13 '17 16:09

Don Kirkby

4 Answers

I found three solutions to this problem, depending on how complicated your extraction is. The simplest option is a related function that I hadn't noticed: page.$eval(). It basically does what I was trying to do: combines page.$() and page.evaluate(). Here's an example that works:

const puppeteer = require('puppeteer');

puppeteer.launch().then(function(browser) {
    browser.newPage().then(function(page) {
        page.goto('https://stackoverflow.com/users/4794').then(function() {
            page.$eval('h2.user-card-name', function(heading) {
                return heading.innerText;
            }).then(function(result) {
                console.info(result);
                browser.close();
            });
        });
    });
});

That gives me the expected result:

$ node get_user.js 
Don Kirkby top 2% overall

I wanted to extract something more complicated, but I finally realized that the evaluation function is running in the context of the page. That means you can use any tools that are loaded in the page, and then just send strings and numbers back and forth. In this example, I use jQuery in a string to extract what I want:

const puppeteer = require('puppeteer');

puppeteer.launch().then(function(browser) {
    browser.newPage().then(function(page) {
        page.goto('https://stackoverflow.com/users/4794').then(function() {
            page.evaluate("$('h2.user-card-name').text()").then(function(result) {
                console.info(result);
                browser.close();
            });
        });
    });
});

That gives me a result with the whitespace intact:

$ node get_user.js 

                            Don Kirkby

                                top 2% overall

In my real script, I want to extract the text of several nodes, so I need a function instead of a simple string:

const puppeteer = require('puppeteer');

puppeteer.launch().then(function(browser) {
    browser.newPage().then(function(page) {
        page.goto('https://stackoverflow.com/users/4794').then(function() {
            page.evaluate(function() {
                return $('h2.user-card-name').text();
            }).then(function(result) {
                console.info(result);
                browser.close();
            });
        });
    });
});

That gives the exact same result. Now I need to add error handling, and maybe reduce the indentation levels.

answered Oct 19 '22 01:10

Don Kirkby

Using await/async and $eval, the syntax looks like the following:

await page.goto('https://stackoverflow.com/users/4794')
const nameElement = await context.page.$eval('h2.user-card-name', el => el.text())
console.log(nameElement)

answered Oct 19 '22 01:10

RobLoach

I use page.$eval

const text = await page.$eval('h2.user-card-name', el => el.innerText );
console.log(text);

answered Oct 19 '22 00:10

vnguyen

I had success using the following:

const browser = await puppeteer.launch();
try {
  const page = await browser.newPage();
  await page.goto(url);
  await page.waitFor(2000);
  let html_content = await page.evaluate(el => el.innerHTML, await page.$('.element-class-name'));
  console.log(html_content);
} catch (err) {
  console.log(err);
}

Hope it helps.

answered Oct 19 '22 01:10

Darren Hall

Related questions
                            
                                RepositoryNotFoundError: No repository for "User" was found. Looks like this entity is not registered in current "default" connection? Typeorm
                            
                                How do I fix CLIENT_MISSING_INTENTS error?
                            
                                NodeJS won't return data to jQuery.getJSON
                            
                                node.js process out of memory in http.request loop
                            
                                Change Model values after load in Mongoose
                            
                                Nodejs/express, shutdown gracefully
                            
                                What are core files by node.js
                            
                                Deleting file in Node.js
                            
                                Express.js/Mongoose user roles and permissions
                            
                                Use a node module from casperjs
                            
                                nginx config for express app running on port 3000
                            
                                node.js script -- websocket error
                            
                                Object method with ES6 / Bluebird promises
                            
                                NodeJS connection error with mongoDB
                            
                                How can I promise-ify a one-off usage of gulp in my application?
                            
                                How to import json into MongoDB using Mongoose
                            
                                How do command line tools change their output after outputting it?
                            
                                Inheriting Mongoose schemas
                            
                                GraphQL Subscriptions: Max Listeners Exceeded Warning
                            
                                How to connect nodeJS docker container to mongoDB

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting DOM node text with Puppeteer and headless Chrome

Tags:

node.js

puppeteer

google-chrome-headless

Don Kirkby

People also ask

4 Answers

Don Kirkby

RobLoach

vnguyen

Darren Hall

Recent Activity

Donate For Us