Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

page.evaluate Vs. Puppeteer $ methods

I'm interested in the differences of these two blocks of code.

const $anchor = await page.$('a.buy-now');
const link = await $anchor.getProperty('href');
await $anchor.click();
await page.evaluate(() => {
    const $anchor = document.querySelector('a.buy-now');
    const text = $anchor.href;
    $anchor.click();
});

I've generally found raw DOM elements in page.evaluate() easier to work and the ElementHandles returned by the $ methods an abstraction to far.

However I felt perhaps that the async Puppeteer methods might be more performant or improve reliability? I couldn't find any guidance on this in the docs and would be interested in learning more about the pro's/con's about each approach and the motivation behind adding methods like page.$$().

like image 419
lpoulter Avatar asked Apr 13 '19 10:04

lpoulter


People also ask

What does Page Evaluate do in puppeteer?

Evaluates a function in the page's context and returns the result. If the function passed to page. evaluteHandle returns a Promise, the function will wait for the promise to resolve and return its value.

What is puppeteer used for?

Puppeteer is a Node library that provides a high-level API to control headless Chrome over the DevTools Protocol. Also known as a Headless Chrome Node API, it is useful for automating the Chrome browser to run website tests. Fundamentally, Puppeteer is an automation tool and not a test tool.

What is waitForSelector in puppeteer?

Puppeteer page. waitForSelector method is used to wait for the selector to appear or to disappear from the page.


1 Answers

The main difference between those lines of code is the interaction between the Node.js and the browser environment.

The first code snippet will do the following:

  • Run document.querySelector in the browser and return the element handle (to the Node.js environment)
  • Run getProperty on the handle and return the result (to the Node.js environment)
  • Click an element inside the browser

The second code snippet simply does this:

  • Run the given function in the browser context (and return results to the Node.js environment)

Performance

Regarding the performance of these statements, one has to remember that puppeteer communicates via WebSockets with the browser. Therefore the second statement will run faster as there is just one command send to the browser (in contrast to three).

This might make a big difference if the browser you are connecting to is running on a different machine (connected to using puppeteer.connect). It will likely only result in a few milliseconds difference if the script and the browser are located on the same machine. In the latter case it might therefore not make a big difference.

Advantage of using element handles

Using element handles has some advantages. First, functions like elementHandle.click will behave more "human-like" in contrast to using document.querySelector('...').click(). puppeteer will for example move the mouse to the location and click in the center of the element instead of just executing the click function.

When to use what

In general, I recommend to use page.evaluate whenever possible as this API is also a lot easier to debug. When an error happens, you can simply reproduce the error by opening the DevTools in your Chrome browser and rerunning the same lines in your browser. If you are mixing a lot of page.$ statements together it might be much harder to understand what the problem is and whether it happened inside the Node.js or the browser runtime.

Use the element handles if you need the element for longer (because you maybe have make some complex calculations or wait for an external event before you can extract information from them).

like image 98
Thomas Dondorf Avatar answered Oct 23 '22 13:10

Thomas Dondorf