I'm running puppeteer on express/node/ubuntu as follow:
var puppeteer = require('puppeteer');
var express = require('express');
var router = express.Router();
/* GET home page. */
router.get('/', function(req, res, next) {
(async () => {
headless = true;
const browser = await puppeteer.launch({headless: true, args:['--no-sandbox']});
const page = await browser.newPage();
url = req.query.url;
await page.goto(url);
let bodyHTML = await page.evaluate(() => document.body.innerHTML);
res.send(bodyHTML)
await browser.close();
})();
});
running this script multiple times leaves hundred of Zombies:
$ pgrep chrome | wc -l
133
Which clogs the srv,
How do I fix this?
Running kill
from a Express JS script could solve it?
Is there a better way to get the same result other than puppeteer and headless chrome?
Advertisements. By default, Puppeteer executes the test in headless Chromium. This means if we are running a test using Puppeteer, then we won't be able to view the execution in the browser. To enable execution in the headed mode, we have to add the parameter: headless:false in the code.
Making Puppeteer Undetectable For puppeteer, there is a stealth plugin that implements a lot of browser stealth tricks. Let's install it and add it to the script. And that's it, it will be very hard to detect the Puppeteer browser now as being a scraping-bot.
Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.
Ahhh! This is a simple oversight. What if an error occurs and your await browser.close()
never executes thus leaving you with zombies.
Using shell.js
seems to be a hacky way of solving this issue.
The better practice is to use try..catch..finally
. The reason being you would want the browser to be closed irrespective of a happy flow or an error being thrown.
And unlike the other code snippet, you don't have to try and close the browser in the both the catch
block and finally
block. finally
block is always executed irrespective of whether an error is thrown or not.
So, your code should look like,
const puppeteer = require('puppeteer');
const express = require('express');
const router = express.Router();
/* GET home page. */
router.get('/', function(req, res, next) {
(async () => {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox'],
});
try {
const page = await browser.newPage();
url = req.query.url;
await page.goto(url);
const bodyHTML = await page.evaluate(() => document.body.innerHTML);
res.send(bodyHTML);
} catch (e) {
console.log(e);
} finally {
await browser.close();
}
})();
});
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With