Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up puppeteer?

A web page has a button and puppeteer must click that button as soon as possible button becomes visible. This button is not always visible and it is becoming visible for everyone at the same time. So i have to refresh constantly to find that button is became visible. I wrote that script below for to do that:

    const browser = await puppeteer.launch({
        headless: true,
        args: ['--no-sandbox']
    });
    const page = await browser.newPage()
    await page.setViewport({ width: 1920, height: 1080})


//I am calling my pageRefresher method here

async function pageRefresher(page,browser, url) {
        try {
            await page.goto(url, {waitUntil: 'networkidle2'})
            try {
                await page.waitForSelector('#ourButton', {timeout: 10});
                await page.click('#ourButton')
                console.log(`clicked!`)
                await browser.close()
            } catch (error) {
                console.log('catch2 ' + counter + ' '  + error)
                counter += 1
                await pageRefresher(page, browser, url)
            }
        }catch (error) {
            console.log('catch3' + error)
            await browser.close();
        }
}

As you can see, my method is recursive. It goes to that page and looking for that button. If there is no button then it calls itself again for redoing the same job until it finds and clicks to that button.

Actually it works well right now. But it is slow. I am running this script meanwhile i am opening the same page on my desktop chrome and i am starting to refresh that page manually. And i am always winning, i am always clicking to that button before the puppeteer.

How can i speed up this process? A script should not lose to a human who has just manual controls like F5 button.

like image 751
Tolgay Toklar Avatar asked Jul 11 '20 17:07

Tolgay Toklar


People also ask

How much RAM does Puppeteer need?

Actors using Puppeteer: at least 1GB of memory. Large and complex sites like Google Maps: at least 4GB for optimal speed and concurrency.

How do you slow down a Puppeteer?

# Slow it downThe slowMo option slows down Puppeteer operations by the specified amount of milliseconds. It's another way to help see what's going on.

Is Puppeteer undetectable?

Making Puppeteer Undetectable For puppeteer, there is a stealth plugin that implements a lot of browser stealth tricks. Let's install it and add it to the script. And that's it, it will be very hard to detect the Puppeteer browser now as being a scraping-bot.

Can Puppeteer run in browser?

Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.


Video Answer


1 Answers

A script should not lose to a human who has just manual controls like F5 button.

It happens because sometimes the rules that puppeteer follows are much stricter than what we consider as a "fully loaded webpage". Even if you as a human can decide whether your desired element is in the DOM already (because you see the element is there) or it is not there (because you don't see it). E.g.: you will see that your button is not there even if the background image is still loading in the background, or the webfonts are still not loaded and you have the fallback fonts, but puppeteer waits for specific events in the background to get the permission either to go to the catch block (timeout) or to grab the desired element (waitForSelector succeeds). It can really depends on the site you are visiting, but you are able to speed up the process of recognition of your desired element.

I give some examples and ideas how you can achieve this.


Ways to speed up recognition of the desired element

1.) If you don't need every network connections for your task you could speed up page loading by replacing waitUntil: 'networkidle2' to waitUntil: 'domcontentloaded' as this event happens usually earlier and will be fired when #ourButton will be already present in the DOM.

The possible options of page.goto/page.reload:

  • load - consider navigation to be finished when the load event is fired.
  • domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
  • networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
  • networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.

You are winning over the script because of networkidle2 is too strict. You may need this option (e.g. you are visiting a single-page application or later you will need data from the 3rd party network connection e.g. cookies) but in case it is not mandatory you will experience better performance with domcontentloaded.

2.) Instead of constantly navigating to the same url you could use page.reload method in a loop, e.g.:

await page.goto(url, { waitUntil: 'domcontentloaded' })
let selectorExists = await page.$('#ourButton')

while (selectorExists === null) {
  await page.reload({ waitUntil: 'domcontentloaded' })
  console.log('reload')
  selectorExists = await page.$('#ourButton')
}
await page.click('#ourButton')
// code goes on...

Its main benefit is that you are able to shorten and simplify your pageRefresher function. But I experienced also better performance (however I did no benchmarking but I felt it much faster than re-opening a page).

3.) If you don't need every resource type for your task you could also speed up page loading by disabling images or css with the following script:

await page.setRequestInterception(true)
page.on('request', (request) => {
  if (request.resourceType() === 'image') request.abort()
  else request.continue()
})

[source]

List of resourceType-s.

like image 133
theDavidBarton Avatar answered Oct 14 '22 04:10

theDavidBarton