My Node.js puppeteer script fills out a form successfully, but the page only accepts a "click" event on an element some of the time before returning the modified page content. Here's the script:
const fetchContracts = async (url) => {
const browser = await pupeteer.launch({ headless: true, args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
const pendingXHR = new PendingXHR(page);
await page.goto(url, { waitUntil: 'networkidle2' });
await Promise.all([
page.click("#agree_statement"),
page.waitForNavigation()
]);
await page.click(".form-check-input");
await Promise.all([
page.click(".btn-primary"),
page.waitForNavigation()
]);
/// MY PROBLEM OCCURS HERE
/// Sometimes these clicks do not register....
await page.click('#filedReports th:nth-child(5)')
await pendingXHR.waitForAllXhrFinished();
await page.click('#filedReports th:nth-child(5)');
await pendingXHR.waitForAllXhrFinished();
/// And my bot skips directly here....
let html = await page.content();
await page.close();
await browser.close();
return html;
}
The "pendingXHR" module is an import, which I pull in up top in my code from this library:
const { PendingXHR } = require('pending-xhr-puppeteer');
The script works on my local computer, and works some of the time when I upload the script to Digital Ocean. According to the page that I am crawling, these clicks initiate XHR requests, which I am attempting to wait for. Here's proof:
So my question is:
Why would these clicks not register, even though I am awaiting them and awaiting the XHR requests, before the html is pulled from the page and then returned? And why the inconsistency with this, where sometimes the clicks are registered and sometimes they are not?
Thanks for your help.
Short answer: The click will lead to a delayed AJAX request and therefore pendingXHR.waitForAllXhrFinished()
will immediately resolve as there are no requests happening at the time the function is executed. Use page.waitForResponse('.../data/')
instead.
You are expecting the following process of events to happen:
pendingXHR.waitForAllXhrFinished()
executedpendingXHR.waitForAllXhrFinished()
resolvespage.content()
executedThe problem is that the library (PendingXHR) you are using waits for the currently pending requests and resolves as soon as they are resolved. This does not work in two cases that I can think of:
1. The AJAX request is started asynchronously
In this case, the order of the events would be like this:
pendingXHR.waitForAllXhrFinished()
executedpendingXHR.waitForAllXhrFinished()
resolves immediately (as there are no requests)page.content()
executed (too early!)
2. The UI modifies the table asynchronously
In this case, the order of the events would be like this:
pendingXHR.waitForAllXhrFinished()
executedpendingXHR.waitForAllXhrFinished()
resolvespage.content()
(too early!)
The inconsistency happens as sometimes the events might be in the right order as this is a case in which a millisecond can decide what happens first.
Without looking at the code of the page, I cannot say which case it is for sure (it might actually be both), but I would guess it is the first one as I can totally see the table library to wait for any double clicks/dragging/etc. to happen before it makes the AJAX request.
The first problem can be fixed by using page.waitForResponse
instead of pendingXHR.waitForAllXhrFinished
as this makes sure that the request to data/
has actually happened.
Fixing the second case (if necessary) is not that trivial, but can be done by introducing a fixed waiting time by using page.waitFor(10)
.
By fixing both cases, the new code looks like this:
await Promise.all([ // wait for the response to happen and click
page.waitForResponse('.../data/'), // use the actual URL here
page.click('...'),
]);
await page.waitFor(10); // wait for any asynchronous rerenders that might happen
let html = await page.content();
did you try to do a workaround like:
await page.waitfor(1000);// this line will wait for 1 Sec
this way you can be sure that it loaded the better way is to put the page.click in a Promise.all Like this:
await Promise.all([
await page.click('#filedReports th:nth-child(5)'),
await pendingXHR.waitForAllXhrFinished()
]);
PS: you are missing a semi-colon at
/// MY PROBLEM OCCURS HERE
/// Sometimes these clicks do not register....
\/
await page.click('#filedReports th:nth-child(5)')
await pendingXHR.waitForAllXhrFinished(); /\
await page.click('#filedReports th:nth-child(5)');
await pendingXHR.waitForAllXhrFinished();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With