Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puppeteer: Chromium instances remain active in the background after browser.disconnect

My environment

  • Puppeteer version: 3.1.0
  • Platform / OS version: Windows 10
  • Node.js version: 12.16.1

My problem is:

I have a for...of loop to visit 3000+ urls with puppeteer. I use puppeteer.connect to wsEndpoint so I can reuse one browser instance. I disconnect after each visit and close the tab.

  • first 100 urls page.goto's open the urls immediately,
  • above 100 page.goto uses 2-3 retries per url,
  • above 300 page.goto uses 5-8 retries per url,
  • above 500 I get TimeoutError: Navigation timeout of 30000 ms exceeded all the time.

I checked the Windows Task Manager and I realized hundreds of Chromium instances running in the background and using 80-90MB of memory each and 1-2% of CPU as well.

Question

How can I kill the Chromium instances I've already disconnected with browser.disconnect for real?

Example script

const puppeteer = require('puppeteer')
const urlArray = require('./urls.json') // contains 3000+ urls in an array


async function fn() {
  const browser = await puppeteer.launch({ headless: true })
  const browserWSEndpoint = await browser.wsEndpoint()

  for (const url of urlArray) {
    try {
      const browser2 = await puppeteer.connect({ browserWSEndpoint })
      const page = await browser2.newPage()
      await page.goto(url) // in my original code it's also wrapped in a retry function

      // doing cool things with the DOM

      await page.goto('about:blank') // because of you: https://github.com/puppeteer/puppeteer/issues/1490
      await page.close()
      await browser2.disconnect()
    } catch (e) {
      console.error(e)
    }
  }
  await browser.close()
}
fn()

The error

The usual puppeteer timeout error.

TimeoutError: Navigation timeout of 30000 ms exceeded
    at C:\[...]\node_modules\puppeteer\lib\LifecycleWatcher.js:100:111
  -- ASYNC --
    at Frame.<anonymous> (C:\[...]\node_modules\puppeteer\lib\helper.js:94:19)
    at Page.goto (C:\[...]\node_modules\puppeteer\lib\Page.js:476:53)
    at Page.<anonymous> (C:\[...]\node_modules\puppeteer\lib\helper.js:95:27)
    at example (C:\[...]\example.js:13:18)
    at processTicksAndRejections (internal/process/task_queues.js:97:5) {
  name: 'TimeoutError'
}

like image 629
theDavidBarton Avatar asked Jun 05 '20 17:06

theDavidBarton


People also ask

Is puppeteer a headless browser?

Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol.

Can puppeteer use Chrome instead of Chromium?

By default, Puppeteer downloads and uses a specific version of Chromium so its API is guaranteed to work out of the box. To use Puppeteer with a different version of Chrome or Chromium, pass in the executable's path when creating a Browser instance: const browser = await puppeteer.

How do you connect already existing Chrome browser with puppeteer?

You can connect to an existing using the connect function: const browserURL = 'http://127.0.0.1:21222'; const browser = await puppeteer. connect({browserURL}); But, if you want to use those 2 lines you need to launch Chrome with the "--remote-debugging-port=21222 argument.

How do I get a puppeteer to open in a new tab?

You open a new tab in puppeteer using the newPage() method present in the browser object. const page = await browser. newPage(); Complete code for opening the new tab in the browser.


1 Answers

Finally I was able to achieve the desired result by adding --single-process and --no-zygote args at launch (+ --no-sandbox is required with them).

The number of running Chromium processes aren't growing exponentially anymore, but only two instances remain active: one of them is the usual empty tab in the first position, the second is reused correctly by puppeteer.connect({ browserWSEndpoint }).

[...]
  const browser = await puppeteer.launch({
    headless: true,
    args: ['--single-process', '--no-zygote', '--no-sandbox']
  })
  const browserWSEndpoint = await browser.wsEndpoint()
[...]
  • --single-process: Runs the renderer and plugins in the same process as the browser [source]

  • --no-zygote: Disables the use of a zygote process for forking child processes. Instead, child processes will be forked and exec'd directly. Note that --no-sandbox should also be used together with this flag because the sandbox needs the zygote to work. [source]

like image 180
theDavidBarton Avatar answered Oct 21 '22 02:10

theDavidBarton