Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puppeteer: How to get the contents of each element of a nodelist?

I'm trying to achieve something very trivial: Get a list of elements, and then do something with the innerText of each element.

const tweets = await page.$$('.tweet');

From what I can tell, this returns a nodelist, just like the document.querySelectorAll() method in the browser.

How do I just loop over it and get what I need? I tried various stuff, like:

[...tweets].forEach(tweet => {
  console.log(tweet.innerText)
});
like image 469
i.brod Avatar asked Oct 16 '18 02:10

i.brod


People also ask

How do you find the value of an element in a puppeteer?

page. $eval() function is used to get the value for an element in puppeteer. $eval will stage two-parameter as an argument first parameter will be the selector and the second parameter will be element= element.

How do you get the inner text on a puppeteer?

We can get element text in Puppeteer. This is done with the help of the textContent property. This property of the element is passed as a parameter to the getProperty method.

How do you get a list of puppeteer elements?

You can get the elements by using the class in puppeteer, but the puppeteer does not understand what is class or id; so you have to use the CSS format to make the puppeteer understand it. Use . (dot) before the class name to denote that the following is class.

How do you evaluate a page in a puppeteer?

evaluate() method. Evaluates a function in the page's context and returns the result. If the function passed to page. evaluteHandle returns a Promise, the function will wait for the promise to resolve and return its value.


2 Answers

page.$$():

You can use a combination of elementHandle.getProperty() and jsHandle.jsonValue() to obtain the innerText from an ElementHandle obtained with page.$$():

const tweets = await page.$$('.tweet');

for (let i = 0; i < tweets.length; i++) {
  const tweet = await (await tweets[i].getProperty('innerText')).jsonValue();
  console.log(tweet);
}

If you are set on using the forEach() method, you can wrap the loop in a promise:

const tweets = await page.$$('.tweet');

await new Promise((resolve, reject) => {
  tweets.forEach(async (tweet, i) => {
    tweet = await (await tweet.getProperty('innerText')).jsonValue();
    console.log(tweet);
    if (i === tweets.length - 1) {
      resolve();
    }
  });
});

page.evaluate():

Alternatively, you can skip using page.$$() entirely, and use page.evaluate():

const tweets = await page.evaluate(() => Array.from(document.getElementsByClassName('tweet'), e => e.innerText));

tweets.forEach(tweet => {
  console.log(tweet);
});
like image 61
Grant Miller Avatar answered Sep 21 '22 19:09

Grant Miller


According to puppeteer docs here, $$ Does not return a nodelist, instead it returns a Promise of Array of ElementHandle. It's way different then a NodeList.

There are several ways to solve the problem.

1. Using built-in function for loops called page.$$eval

This method runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.

So to get innerText is like following,

// Find all .tweet, and return innerText for each element, in a array.
const tweets = await page.$$eval('.tweet', element => element.innerText);

2. Pass the elementHandle to the page.evaluate

Whatever you get from await page.$$('.tweet') is an array of elementHandle. If you console, it will say JShandle or ElementHandle depending on the type.

Forget the hard explanation, it's easier to demonstrate.

// let's just call them tweetHandle 
const tweetHandles = await page.$$('.tweet');

// loop thru all handles
for(const tweethandle of tweetHandles){

   // pass the single handle below
   const singleTweet = await page.evaluate(el => el.innerText, tweethandle)

   // do whatever you want with the data
   console.log(singleTweet) 
}

Of course there are multiple ways to solve this problem, Grant Miller also answered few of them in the other answer.

like image 20
Md. Abu Taher Avatar answered Sep 19 '22 19:09

Md. Abu Taher