Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use xpath in chrome headless+puppeteer evaluate()?

How can I use $x() to use xpath expression inside a page.evaluate() ?

As far as page is not in the same context, I tried $x() directly (like I would do in chrome dev tools), but no cigar.

The script goes in timeout.

like image 957
MevatlaveKraspek Avatar asked Jan 25 '18 17:01

MevatlaveKraspek


People also ask

How do you evaluate xpath in puppeteer?

The xpath for the element shall be //*[text()='Library']. Here, we are working with the xpath selector, so we have to use the method: page. $x(xpath value). The detail on this method is discussed in the Chapter - Puppeteer Locators.

How do you evaluate a page in a puppeteer?

evaluate() method. Evaluates a function in the page's context and returns the result. If the function passed to page. evaluteHandle returns a Promise, the function will wait for the promise to resolve and return its value.

What is headless mode in puppeteer?

What exactly is Puppeteer? It's a Node. js library which provides a high-level API to control headless Chrome or Chromium or to interact with the DevTools protocol.

What is Chrome puppeteer?

Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.


2 Answers

If you insist on using page.$x(), you can simply pass the result to page.evaluate():

const example = await page.evaluate(element => {
  return element.textContent;
}, (await page.$x('//*[@id="result"]'))[0]);
like image 44
Grant Miller Avatar answered Oct 10 '22 22:10

Grant Miller


$x() is not a standard JavaScript method to select element by XPath. $x() it's only a helper in chrome devtools. They claim this in the documentation:

Note: This API is only available from within the console itself. You cannot access the Command Line API from scripts on the page.

And page.evaluate() is treated here as a "scripts on the page".

You have two options:

  1. Use document.evaluate

Here is a example of selecting element (featured article) inside page.evaluate():

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' });

    const text = await page.evaluate(() => {
        // $x() is not a JS standard -
        // this is only sugar syntax in chrome devtools
        // use document.evaluate()
        const featureArticle = document
            .evaluate(
                '//*[@id="mp-tfa"]',
                document,
                null,
                XPathResult.FIRST_ORDERED_NODE_TYPE,
                null
            )
            .singleNodeValue;

        return featureArticle.textContent;
    });

    console.log(text);
    await browser.close();
})();
  1. Select element by Puppeteer page.$x() and pass it to page.evaluate()

This example achieves the same results as in the 1. example:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://en.wikipedia.org', { waitUntil: 'networkidle2' });

    // await page.$x() returns array of ElementHandle
    // we are only interested in the first element
    const featureArticle = (await page.$x('//*[@id="mp-tfa"]'))[0];
    // the same as:
    // const featureArticle = await page.$('#mp-tfa');

    const text = await page.evaluate(el => {
        // do what you want with featureArticle in page.evaluate
        return el.textContent;
    }, featureArticle);

    console.log(text);
    await browser.close();
})();

Here is a related question how to inject $x() helper function to your scripts.

like image 129
Everettss Avatar answered Oct 10 '22 23:10

Everettss