Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puppeteer Not Triggering Click Before Returning HTML

My Node.js puppeteer script fills out a form successfully, but the page only accepts a "click" event on an element some of the time before returning the modified page content. Here's the script:

const fetchContracts = async (url) => {
    const browser = await pupeteer.launch({ headless: true, args: ['--no-sandbox', '--disable-setuid-sandbox']});
    const page = await browser.newPage();
    const pendingXHR = new PendingXHR(page);


    await page.goto(url, { waitUntil: 'networkidle2' });
    await Promise.all([
        page.click("#agree_statement"),
        page.waitForNavigation()
    ]);

    await page.click(".form-check-input");

    await Promise.all([
        page.click(".btn-primary"),
        page.waitForNavigation()
    ]);    

    /// MY PROBLEM OCCURS HERE
    /// Sometimes these clicks do not register....
    await page.click('#filedReports th:nth-child(5)')
    await pendingXHR.waitForAllXhrFinished();
    await page.click('#filedReports th:nth-child(5)');
    await pendingXHR.waitForAllXhrFinished();

    /// And my bot skips directly here....
    let html = await page.content();
    await page.close();
    await browser.close();
    return html;

}

The "pendingXHR" module is an import, which I pull in up top in my code from this library:

const { PendingXHR } = require('pending-xhr-puppeteer');

The script works on my local computer, and works some of the time when I upload the script to Digital Ocean. According to the page that I am crawling, these clicks initiate XHR requests, which I am attempting to wait for. Here's proof:

enter image description here

So my question is:

Why would these clicks not register, even though I am awaiting them and awaiting the XHR requests, before the html is pulled from the page and then returned? And why the inconsistency with this, where sometimes the clicks are registered and sometimes they are not?

Thanks for your help.

like image 930
Harrison Cramer Avatar asked Mar 27 '19 03:03

Harrison Cramer


2 Answers

Short answer: The click will lead to a delayed AJAX request and therefore pendingXHR.waitForAllXhrFinished() will immediately resolve as there are no requests happening at the time the function is executed. Use page.waitForResponse('.../data/') instead.

Problem

You are expecting the following process of events to happen:

  1. Click happens
  2. AJAX request starts
  3. pendingXHR.waitForAllXhrFinished() executed
  4. AJAX request finishes
  5. Table is rendered
  6. pendingXHR.waitForAllXhrFinished() resolves
  7. page.content() executed

The problem is that the library (PendingXHR) you are using waits for the currently pending requests and resolves as soon as they are resolved. This does not work in two cases that I can think of:

1. The AJAX request is started asynchronously

In this case, the order of the events would be like this:

  1. Click happens, but starts the AJAX call asynchronously (later)
  2. pendingXHR.waitForAllXhrFinished() executed
  3. pendingXHR.waitForAllXhrFinished() resolves immediately (as there are no requests)
  4. page.content() executed (too early!)
  5. AJAX request starts
  6. AJAX request finishes
  7. Table is rendered

2. The UI modifies the table asynchronously

In this case, the order of the events would be like this:

  1. Click happens
  2. AJAX request starts
  3. pendingXHR.waitForAllXhrFinished() executed
  4. AJAX request finishes (but the code renders the table later)
  5. pendingXHR.waitForAllXhrFinished() resolves
  6. page.content() (too early!)
  7. Table is rendered

The inconsistency happens as sometimes the events might be in the right order as this is a case in which a millisecond can decide what happens first.

Fix

Without looking at the code of the page, I cannot say which case it is for sure (it might actually be both), but I would guess it is the first one as I can totally see the table library to wait for any double clicks/dragging/etc. to happen before it makes the AJAX request.

The first problem can be fixed by using page.waitForResponse instead of pendingXHR.waitForAllXhrFinished as this makes sure that the request to data/ has actually happened.

Fixing the second case (if necessary) is not that trivial, but can be done by introducing a fixed waiting time by using page.waitFor(10).

By fixing both cases, the new code looks like this:

await Promise.all([ // wait for the response to happen and click
    page.waitForResponse('.../data/'), // use the actual URL here
    page.click('...'),
]);
await page.waitFor(10); // wait for any asynchronous rerenders that might happen
let html = await page.content();
like image 99
Thomas Dondorf Avatar answered Oct 23 '22 12:10

Thomas Dondorf


did you try to do a workaround like:

await page.waitfor(1000);// this line will wait for 1 Sec 

this way you can be sure that it loaded the better way is to put the page.click in a Promise.all Like this:

await Promise.all([
    await page.click('#filedReports th:nth-child(5)'),
    await pendingXHR.waitForAllXhrFinished()
]); 

PS: you are missing a semi-colon at


/// MY PROBLEM OCCURS HERE
/// Sometimes these clicks do not register....  
                                                \/
await page.click('#filedReports th:nth-child(5)')
await pendingXHR.waitForAllXhrFinished();       /\
await page.click('#filedReports th:nth-child(5)');
await pendingXHR.waitForAllXhrFinished();

like image 42
Rakan Habab Avatar answered Oct 23 '22 10:10

Rakan Habab