Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

page does not wait for another page to finish their tasks before continuing

So here's the code snippet:

    for (let item of items)
    {
        await page.waitFor(10000)
        await page.click("#item_"+item)
        await page.click("#i"+item)

        let pages = await browser.pages()
        let tempPage = pages[pages.length-1]

        await tempPage.waitFor("a.orange", {timeout: 60000, visible: true})
        await tempPage.click("a.orange")

        counter++
    }

page and tempPage are two different pages.

What happens is that page waits for 10 seconds, then clicks some stuff, which opens a second page.

What's supposed to happen is that tempPage waits for an element, clicks it, then page should wait 10 seconds before doing it all over again.

However, what actually happens is that page waits for 10 seconds, clicks the stuff, then starts waiting for 10 seconds without waiting for tempPage to finish its tasks.

Is this a bug, or am I misunderstanding something? How should I fix this so that when the for loop loops again, it is only after tempPage has clicked.

like image 658
A. L Avatar asked Dec 04 '17 00:12

A. L


People also ask

How do you wait for promise to finish?

You can use the async/await syntax or call the . then() method on a promise to wait for it to resolve. Inside of functions marked with the async keyword, you can use await to wait for the promises to resolve before continuing to the next line of the function.

How do you wait for an async function to finish?

Inside an async function, you can use the await keyword before a call to a function that returns a promise. This makes the code wait at that point until the promise is settled, at which point the fulfilled value of the promise is treated as a return value, or the rejected value is thrown.

How do you make a puppeteer wait?

You can use Puppeteer's page. waitForNavigation() method here to explicitly wait for this event to happen and then continue your script. The accepted notation in Puppeteer's case is by using the Promise. all() method to wait for the click to happen and the navigation to happen before continuing.


1 Answers

Generally, you cannot rely on await tempPage.click("a.orange") to pause execution until tempPage has "finish[ed] its tasks". For super simple code that executes synchronously, it may work. But in general, you cannot rely on it.

If the click triggers an Ajax operation, or starts a CSS animation, or starts a computation that cannot be immediately computed, or opens a new page, etc., then the result you are waiting for is asynchronous, and the .click method will not wait for this asynchronous operation to complete.

What can you do? In some cases you may be able to hook into the code that is running on the page and wait for some event that matters to you. For instance, if you want to wait for an Ajax operation to be done and the code on the page uses jQuery, then you might use ajaxComplete to detect when the operation is complete. If you cannot hook into any event system to detect when the operation is done, then you may need to poll the page to wait for evidence that the operation is done.

Here is an example that shows the issue:

const puppeteer = require('puppeteer');

function getResults(page) {
    return page.evaluate(() => ({
        clicked: window.clicked,
        asynchronousResponse: window.asynchronousResponse,
    }));
}

puppeteer.launch().then(async browser => {
    const page = await browser.newPage();
    await page.goto("https://example.com");
    // We add a button to the page that will click later.
    await page.evaluate(() => {
        const button = document.createElement("button");
        button.id = "myButton";
        button.textContent = "My Button";
        document.body.appendChild(button);
        window.clicked = 0;
        window.asynchronousResponse = 0;
        button.addEventListener("click", () => {
            // Synchronous operation
            window.clicked++;

            // Asynchronous operation.
            setTimeout(() => {
                window.asynchronousResponse++;
            }, 1000);
        });
    });

    console.log("before clicks", await getResults(page));

    const button = await page.$("#myButton");
    await button.click();
    await button.click();
    console.log("after clicks", await getResults(page));

    await page.waitForFunction(() => window.asynchronousResponse === 2);
    console.log("after wait", await getResults(page));

    await browser.close();
});

The setTimeout code simulates any kind of asynchronous operation started by the click.

When you run this code, you'll see on the console:

before click { clicked: 0, asynchronousResponse: 0 }
after click { clicked: 2, asynchronousResponse: 0 }
after wait { clicked: 2, asynchronousResponse: 2 }

You see that clicked is immediately incremented twice by the two clicks. However, it takes a while before asynchronousResponse is incremented. The statement await page.waitForFunction(() => window.asynchronousResponse === 2) polls the page until the condition we are waiting for is realized.


You mentioned in a comment that the button is closing the tab. Opening and closing tabs are asynchronous operations. Here's an example:

puppeteer.launch().then(async browser => {
    let pages = await browser.pages();
    console.log("number of pages", pages.length);
    const page = pages[0];
    await page.goto("https://example.com");
    await page.evaluate(() => {
        window.open("https://example.com");
    });

    do {
        pages = await browser.pages();
        // For whatever reason, I need to have this here otherwise
        // browser.pages() always returns the same value. And the loop
        // never terminates.
        await page.evaluate(() => {});
        console.log("number of pages after evaluating open", pages.length);
    } while (pages.length === 1);

    let tempPage = pages[pages.length - 1];

    // Add a button that will close the page when we click it.
    tempPage.evaluate(() => {
        const button = document.createElement("button");
        button.id = "myButton";
        button.textContent = "My Button";
        document.body.appendChild(button);
        window.clicked = 0;
        window.asynchronousResponse = 0;
        button.addEventListener("click", () => {
            window.close();
        });
    });

    const button = await tempPage.$("#myButton");
    await button.click();

    do {
        pages = await browser.pages();
        // For whatever reason, I need to have this here otherwise
        // browser.pages() always returns the same value. And the loop
        // never terminates.
        await page.evaluate(() => {});
        console.log("number of pages after click", pages.length);
    } while (pages.length > 1);

    await browser.close();
});

When I run the above, I get:

number of pages 1
number of pages after evaluating open 1
number of pages after evaluating open 1
number of pages after evaluating open 2
number of pages after click 2
number of pages after click 1

You can see it takes a bit before window.open() and window.close() have detectable effects.


In your comment you also wrote:

I thought await was basically what turned an asynchronous function into a synchronous one

I would not say it turns asynchronous functions into synchronous ones. It makes the current code wait for an asynchronous operation's promise to be resolved or rejected. However, more importantly for the issue at hand here, the problem is that you have two virtual machines executing JavaScript code: there's Node which runs puppeteer and the script that controls the browser, and there's the browser itself which has its own JavaScript virtual machine. Any await that you use on the Node side affects only the Node code: it has no bearing on the code that runs in the browser.

It can get confusing when you see things like await page.evaluate(() => { some code; }). It looks like it is all of one piece, and all executing in the same virtual machine, but it is not. puppeteer takes the parameter passed to .evaluate, serializes it, and sends it over to the browser, where it executes. Try adding something like await page.evaluate(() => { button.click(); }); in the script above, after const button = .... Something like this:

const button = await tempPage.$("#myButton");
await button.click();
await page.evaluate(() => { button.click(); });

In the script, button is defined before page.evaluate, but you'll get a ReferenceError when page.evaluate runs because button is not defined on the browser side!

like image 148
Louis Avatar answered Sep 24 '22 05:09

Louis