I'm trying to achieve something very trivial: Get a list of elements, and then do something with the innerText
of each element.
const tweets = await page.$$('.tweet');
From what I can tell, this returns a nodelist, just like the document.querySelectorAll()
method in the browser.
How do I just loop over it and get what I need? I tried various stuff, like:
[...tweets].forEach(tweet => {
console.log(tweet.innerText)
});
page. $eval() function is used to get the value for an element in puppeteer. $eval will stage two-parameter as an argument first parameter will be the selector and the second parameter will be element= element.
We can get element text in Puppeteer. This is done with the help of the textContent property. This property of the element is passed as a parameter to the getProperty method.
You can get the elements by using the class in puppeteer, but the puppeteer does not understand what is class or id; so you have to use the CSS format to make the puppeteer understand it. Use . (dot) before the class name to denote that the following is class.
evaluate() method. Evaluates a function in the page's context and returns the result. If the function passed to page. evaluteHandle returns a Promise, the function will wait for the promise to resolve and return its value.
You can use a combination of elementHandle.getProperty()
and jsHandle.jsonValue()
to obtain the innerText
from an ElementHandle
obtained with page.$$()
:
const tweets = await page.$$('.tweet');
for (let i = 0; i < tweets.length; i++) {
const tweet = await (await tweets[i].getProperty('innerText')).jsonValue();
console.log(tweet);
}
If you are set on using the forEach()
method, you can wrap the loop in a promise:
const tweets = await page.$$('.tweet');
await new Promise((resolve, reject) => {
tweets.forEach(async (tweet, i) => {
tweet = await (await tweet.getProperty('innerText')).jsonValue();
console.log(tweet);
if (i === tweets.length - 1) {
resolve();
}
});
});
Alternatively, you can skip using page.$$()
entirely, and use page.evaluate()
:
const tweets = await page.evaluate(() => Array.from(document.getElementsByClassName('tweet'), e => e.innerText));
tweets.forEach(tweet => {
console.log(tweet);
});
According to puppeteer docs here, $$
Does not return a nodelist, instead it returns a Promise of Array of ElementHandle. It's way different then a NodeList.
There are several ways to solve the problem.
page.$$eval
This method runs Array.from(document.querySelectorAll(selector))
within the page and passes it as the first argument to pageFunction
.
So to get innerText is like following,
// Find all .tweet, and return innerText for each element, in a array.
const tweets = await page.$$eval('.tweet', element => element.innerText);
elementHandle
to the page.evaluate
Whatever you get from await page.$$('.tweet')
is an array of elementHandle. If you console, it will say JShandle
or ElementHandle
depending on the type.
Forget the hard explanation, it's easier to demonstrate.
// let's just call them tweetHandle
const tweetHandles = await page.$$('.tweet');
// loop thru all handles
for(const tweethandle of tweetHandles){
// pass the single handle below
const singleTweet = await page.evaluate(el => el.innerText, tweethandle)
// do whatever you want with the data
console.log(singleTweet)
}
Of course there are multiple ways to solve this problem, Grant Miller also answered few of them in the other answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With