SITUATION:
Here is what I want to do:
1) I load page 0. Page 0 contains clickable links to different pages. I want to load the content of all those pages. So:
2) Click on the first link. Load page 1. Get Data. Go back to the previous page (Page 0)
3) Click on the second link which loads page 2. Etc.. ad infinitum until all links have been clicked.
With my current code, page 0 loads, then the first link is clicked and loads page 1, then there is a crash with the following error:
(node:2629) UnhandledPromiseRejectionWarning: Error: Protocol error (Runtime.callFunctionOn): Execution context was destroyed.
QUESTION:
What am I doing wrong and how can I make my script behave the way I intended ?
CODE:
const puppeteer = require('puppeteer');
const fs = require('fs');
let getData = async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('url', { waitUntil: 'networkidle2' });
await page.setViewport({width: ..., height:...});
const result = await page.evaluate(async () => {
let data = [];
let elements = document.querySelector('.items').querySelectorAll('.item');
for (const element of elements) {
element.click();
await new Promise((resolve) => setTimeout(resolve, 2000));
// GETTING THE DATA THEN PUSHING IT INTO THE DATA ARRAY
await page.goBack();
}
return data; // Return our data array
});
browser.close();
return result; // Return the data
};
OK here's my take on this. Firstly, you're using the evaluate
method incorrectly. Mainly because you don't actually need it but also because you're asking it to do something it can't do. Just to explain: the evaluate
method operates in the context of your web page only. It pretty much only allows you to execute Javascript instructions directly on the current page in the remote browser. It has no concept of variables that you've declared externally to that function - so in this case, when you do this:
await page.goBack();
The evaluate
method has no idea what page
is nor how to use it. Now there are ways to inject page
into the evaluate
method but that won't resolve your problem either. Puppeteer API calls simply won't work inside an evaluate
method (I've tried this myself and it always returns an exception).
So now lets get back to the problem you do have - what you're doing in the evaluate
function is retrieving one UI element with class .items
and then searching for every UI element within that UI element with class .item
. You're then looping through all of the found UI elements, clicking on each one, grabbing some kind of data and then going back to click on the next one.
You can achieve all of this without ever using the evaluate
method and, instead, using Puppeteer API calls as follows:
const itemsList = await page.$('.items'); // Using '.$' is the puppeteer equivalent of 'querySelector'
const elements = await itemsList.$$('.item'); // Using '.$$' is the puppeteer equivalent of 'querySelectorAll'
const data = [];
elements.forEach(async (element) => {
await element.click();
// Get the data you want here and push it into the data array
await page.goBack();
});
Hope this helps you out!
Instead of navigating back-and-forth to click the next link from the first page, it would make better sense to store the links from the first page into an array, and then open them one at a time with page.goto()
.
In other words, you can accomplish this task using the following example:
await page.goto('https://example.com/page-1');
const urls = await page.evaluate(() => Array.from(document.querySelectorAll('.link'), element => element.href));
for (let i = 0, total_urls = urls.length; i < total_urls; i++) {
await page.goto(urls[i]);
// Get the data ...
}
@AJC24's did not work for me. The problem was that the page context was destroyed when clicking in and coming back to the original page.
What I ended up having to do was something similar to what Grant suggested. I collected all of the button identifiers in an array and upon going back to the original page I would click in again.
By using the iterations from @Grant
Execution context was destroyed, most likely because of a navigation.
Then I make it open a new tab in the iteration and it solved the problem!
for (let i = 0, total_urls = urls.length; i < total_urls; i++) {
const page = await browser.newPage();
await page.goto(url), { waitUntil: 'networkidle0', timeout: 0 };
await page.goto(urls[i]);
// Get the data ...
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With