Is it safe/supported to run multiple instances of Puppeteer at the same time, either at
node screenshot.js
at the same time) or puppeteer.launch()
at the same time)?What are the recommended settings/limits on parallel processes?
(In my tests, (1) seems to work fine, but I'm wondering about the reliability of Puppeteer's interactions with the single (?) instance of Chrome. I haven't tried (2) but that seems less likely to work out.)
Memory requirements Actors using Puppeteer: at least 1GB of memory. Large and complex sites like Google Maps: at least 4GB for optimal speed and concurrency.
Puppeteer is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol.
Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.
It's fine to run multiple browser, contexts or even pages in parallel. The limits depend on your network/disk/memory and task setup.
I crawled a few million pages and from time to time (in my setup, every ~10,000 pages) puppeteer will crash. Therefore, you should have a way to auto-restart the browser and retry the job.
You might want to check out puppeteer-cluster, which takes care of pooling the browser instances, restarting and crash detection/restarting. (Disclaimer: I'm the author)
An example of a creation of a cluster is below:
// create a cluster that handles 10 parallel browsers const cluster = await Cluster.launch({ concurrency: Cluster.CONCURRENCY_BROWSER, maxConcurrency: 10, }); // Queue your jobs (one example) cluster.queue(async ({ page }) => { await page.goto('http://www.wikipedia.org'); await page.screenshot({path: 'wikipedia.png'}); });
This is just a minimal example. There are many more ways to use the cluster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With