I have a docker image containing a puppeteer web scraper. It works perfectly in my local machine when I build and run it. It also build fine in cloud build, deploys to cloud run and starts the http server. However, when I run one of the cron jobs dealing with a puppeteer instance, it times out with this error message:
(node:13) UnhandledPromiseRejectionWarning: TimeoutError: Timed out after 30000 ms while trying to connect to Chrome! The only Chrome revision guaranteed to work is r706915
Full log:
A 2019-12-03T15:12:27.748625Z (node:13) UnhandledPromiseRejectionWarning: TimeoutError: Timed out after 30000 ms while trying to connect to Chrome! The only Chrome revision guaranteed to work is r706915
A 2019-12-03T15:12:27.748692Z at Timeout.onTimeout (/node_modules/puppeteer/lib/Launcher.js:359:14)
A 2019-12-03T15:12:27.748705Z at ontimeout (timers.js:436:11)
A 2019-12-03T15:12:27.748716Z at tryOnTimeout (timers.js:300:5)
A 2019-12-03T15:12:27.748726Z at listOnTimeout (timers.js:263:5)
A 2019-12-03T15:12:27.748734Z at Timer.processTimers (timers.js:223:10)
This error happens directly on the puppeteer puppeteer.launch()
function.
I have tried to increase memory in the instance, different dockerfile setups (all from googling), different puppeteer instance arguments and try catching in prod.
I was using this as a base docker image (https://github.com/buildkite/docker-puppeteer), but it wasn't working so I decided to modify it for my own liking, and this is what I have so far:
Dockerfile
FROM node:10.15
RUN apt-get update && apt-get install -y wget --no-install-recommends \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-unstable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst ttf-freefont \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get purge --auto-remove -y curl \
&& rm -rf /src/*.deb
# RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
# RUN dpkg -i google-chrome-stable_current_amd64.deb; apt-get -fy install
# Copy package.json to docker image
COPY package.json ./
RUN npm install
# Copy source code of dir to image
COPY . .
ARG DOCKER_ENV
ENV NODE_ENV=${DOCKER_ENV}
EXPOSE 8080
CMD [ "npm", "run", "prod" ]
openBrowserInstance.js
const randomUserAgent = require(__dirname + '/randomUserAgent');
const randomProxy = require(__dirname + '/../multiple/randomProxy');
const puppeteer = require('puppeteer');
let defaultOptions = {
blockStyleAssets: true,
viewport: {
width: 1920,
height: 1080
},
urls: [''],
screenshotPath: null,
callback: null,
randomUserAgent: true,
randomProxy: true
};
module.exports = ( options, callback ) => {
return new Promise( async( resolve ) => {
options = Object.assign({}, defaultOptions, options);
// Required options
if ( options.urls.length < 1 || typeof callback === 'undefined' ) {
console.log('Missing one or more required options for "openBrowserInstance.js".');
resolve();
return;
}
let browserOptions = {
args: [`--proxy-server=http://${randomProxy()}`,'--lang=en-GB',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage'],
headless: true
};
const browser = await puppeteer.launch( browserOptions );
const page = await browser.newPage();
await page.authenticate({username:'abrCKs', password:'ge2kCw'});
page.viewport( options.viewport );
if ( options.blockStyleAssets ) {
await page.setRequestInterception(true);
page.on('request', (req) => {
let resourceType = req.resourceType();
if (resourceType === 'image' || resourceType === 'stylesheet') {
req.abort();
} else {
req.continue();
}
});
}
for (const [index, url] of options.urls.entries()) {
let userAgent = null;
if ( options.randomUserAgent ) {
userAgent = randomUserAgent();
await page.setUserAgent( userAgent );
}
await page.goto( url, { waitUntil: 'networkidle0' } );
let pageContent = await page.content();
await callback(pageContent, url, index);
await page.close();
}
if ( options.screenshotPath !== null ) {
await page.screenshot({path: screenshotPath, fullPage: true});
}
await browser.close();
resolve();
})
};
cloudbuild.yaml
steps:
- name: 'gcr.io/cloud-builders/git'
args: ['clone', 'GIT-REPO-PLACEHOLDER']
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '--build-arg', 'DOCKER_ENV=dev', '-t', 'eu.gcr.io/$PROJECT_ID/PROJECT-NAME-PLACEHOLDER', '.']
dir: 'PROJECT-NAME-PLACEHOLDER/'
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'eu.gcr.io/$PROJECT_ID/PROJECT-NAME-PLACEHOLDER']
- name: 'gcr.io/cloud-builders/gcloud'
args: ['beta', 'run', 'deploy', 'PROJECT-NAME-PLACEHOLDER', '--image', 'eu.gcr.io/$PROJECT_ID/PROJECT-NAME-PLACEHOLDER', '--region', 'europe-west1','--platform', 'managed', '--quiet', '--memory', '2G']
images:
- eu.gcr.io/$PROJECT_ID/PROJECT-NAME-PLACEHOLDER
Please let me know if you have any recommendations. I have also looked into Google Cloud Functions for this purpose but I wasn't sure if that would work either. If I cannot find a solution, I will be forced to run this on a VM instance, which is hilariously full circle..
Thank you for your time.
Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default, but can be configured to run in full (non-headless) Chrome/Chromium.
Here is a fully functional sample running Puppeteer on Cloud Run:
screenshot.js
:
const puppeteer = require('puppeteer');
exports.screenshot = async (req, res) => {
const url = req.query.url;
if (!url) {
return res.send('Please provide URL as GET parameter, for example: <a href="?url=https://example.com">?url=https://example.com</a>');
}
const browser = await puppeteer.launch({
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto(url);
const imageBuffer = await page.screenshot();
browser.close();
res.set('Content-Type', 'image/png');
res.send(imageBuffer);
};
server.js
:
'use strict';
const {screenshot} = require('./screenshot.js')
const express = require('express');
const puppeteer = require('puppeteer');
const app = express();
app.use(screenshot);
const server = app.listen(process.env.PORT || 8080, err => {
if (err) return console.error(err);
const port = server.address().port;
console.info(`App listening on port ${port}`);
});
module.exports = app;
package.json
:
{
"name": "screenshot",
"version": "1.0.0",
"description": "Takes screenshot of the given URL.",
"author": "Steren",
"scripts": {
"start": "node server.js"
},
"license": "Apache-2.0",
"dependencies": {
"express": "^4.16.4",
"puppeteer": "^1.10.0"
}
}
Dockerfile
:
FROM node:10
# Adds required libs
RUN apt-get update && \
apt-get install -yq gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 \
libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 \
libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 \
libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 \
ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget
# Start the app
WORKDIR /usr/src/app
COPY package*.json ./
ENV NODE_ENV=production
RUN npm install --production
COPY . .
CMD [ "npm", "start" ]
So after an entire day of debugging the answer was right here: https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md#running-puppeteer-in-docker
FROM node:10-slim
# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
&& sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-unstable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# If running Docker >= 1.13.0 use docker run's --init arg to reap zombie processes, otherwise
# uncomment the following lines to have `dumb-init` as PID 1
# ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init
# RUN chmod +x /usr/local/bin/dumb-init
# ENTRYPOINT ["dumb-init", "--"]
# Uncomment to skip the chromium download when installing puppeteer. If you do,
# you'll need to launch puppeteer with:
# browser.launch({executablePath: 'google-chrome-unstable'})
# ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
# Install puppeteer so it's available in the container.
RUN npm i puppeteer \
# Add user so we don't need --no-sandbox.
# same layer as npm install to keep re-chowned files from using up several hundred MBs more space
&& groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser /node_modules
# Run everything after as non-privileged user.
USER pptruser
CMD ["google-chrome-unstable"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With