Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Headless Chrome ( Puppeteer ) - how to get access to document node element?

I'm using phantomJs to parse some content, get some info from it (max image size on page, for example), etc. I've decided to move to puppeteer. And i had faced the issue - in my functions, that was running at phantomJs, they were working with document node element. So, in puppeteer, as i understood, it's impossible to return node element from page.evaluate and other functions. So, is there any other way to overcome this problem? Or maybe i have to use another library? Thank you!

like image 500
Brissy Avatar asked Jan 14 '18 21:01

Brissy


People also ask

What is headless mode in puppeteer?

Advertisements. By default, Puppeteer executes the test in headless Chromium. This means if we are running a test using Puppeteer, then we won't be able to view the execution in the browser. To enable execution in the headed mode, we have to add the parameter: headless:false in the code.

Does puppeteer use Node?

Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome or Chromium.


Video Answer


1 Answers

There are two environments to consider when using Puppeteer:

  1. Node.js Environment
  2. Page DOM Environment

The Node.js environment is built upon Google's Chrome V8 JavaScript engine.

Chrome V8 describes its relation to the DOM:

JavaScript is most commonly used for client-side scripting in a browser, being used to manipulate Document Object Model (DOM) objects for example. The DOM is not, however, typically provided by the JavaScript engine but instead by a browser. The same is true of V8—Google Chrome provides the DOM. V8 does however provide all the data types, operators, objects and functions specified in the ECMA standard.

In other words, the DOM is not provided by default to Node.js.

This means that Node.js does not have the capability to interpret DOM elements on its own.

This is where Puppeteer comes in.

The Puppeteer function page.evaluate() allows you to evaluate an expression in the current Page DOM context using Chrome or Chromium.

The Puppeteer documentation describes what happens when you attempt to return a non-serializable value, like a DOM element:

If the function passed to the page.evaluate returns a non-Serializable value, then page.evaluate resolves to undefined.

Again, this is because Node.js does not know how to interpret DOM elements without help.

As a result, Puppeteer has implemented an ElementHandle class which represents an in-page DOM element.

You can use elementHandle.$(), elementHandle.$$(), or elementHandle.$x() to return ElementHandles back to Node.js.

The ElementHandle class is serializable, so that it can be interpreted properly in the Node.js environment.

Therefore, if you need to manipulate an element directly, you can do so inside page.evaluate(). If you need to access a representation of an element, use page.$() or one of its related functions.

like image 187
Grant Miller Avatar answered Sep 20 '22 12:09

Grant Miller