Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js + Puppeteer on Docker, No usable sandbox

i'm building a node.js LTS application. I followed puppeteer documentation, so my Dockerfile has this content:

FROM node:12.18.0

WORKDIR /home/node/app
ADD package*.json ./

# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN apt-get update \
    && apt-get install -y wget gnupg \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-unstable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf \
      --no-install-recommends \
    && rm -rf /var/lib/apt/lists/*

# Install node modules
RUN npm i

# Add user so we don't need --no-sandbox.
RUN groupadd -r -f audio \
    && groupadd -r -f video \
    && usermod -a -G audio,video node \
    && mkdir -p /home/node/Downloads \
    && chown -R node:node /home/node

USER node

CMD ["google-chrome-unstable"]

Application builds and runs well, but as soon as i try to start browser with await puppeteer.launch(); i get this error:

pdf    | Error: Failed to launch the browser process!
pdf    | [0612/133635.958777:FATAL:zygote_host_impl_linux.cc(116)] No usable sandbox! Update your kernel or see https://chromium.googlesource.com/chromium/src/+/master/docs/linux/suid_sandbox_development.md for more information on developing with the SUID sandbox. If you want to live dangerously and need an immediate workaround, you can try using --no-sandbox.
pdf    | #0 0x5638d5faa399 base::debug::CollectStackTrace()
pdf    | #1 0x5638d5f0b2a3 base::debug::StackTrace::StackTrace()
pdf    | #2 0x5638d5f1cc95 logging::LogMessage::~LogMessage()
pdf    | #3 0x5638d77f940e service_manager::ZygoteHostImpl::Init()
pdf    | #4 0x5638d5ad5060 content::ContentMainRunnerImpl::Initialize()
pdf    | #5 0x5638d5b365e7 service_manager::Main()
pdf    | #6 0x5638d5ad3631 content::ContentMain()
pdf    | #7 0x5638d5b3580d headless::(anonymous namespace)::RunContentMain()
pdf    | #8 0x5638d5b3550c headless::HeadlessShellMain()
pdf    | #9 0x5638d35295a7 ChromeMain
pdf    | #10 0x7fc01f0492e1 __libc_start_main
pdf    | #11 0x5638d35293ea _start
pdf    | 
pdf    | Received signal 6
pdf    | #0 0x5638d5faa399 base::debug::CollectStackTrace()
pdf    | #1 0x5638d5f0b2a3 base::debug::StackTrace::StackTrace()
pdf    | #2 0x5638d5fa9f35 base::debug::(anonymous namespace)::StackDumpSignalHandler()
pdf    | #3 0x7fc0255f30e0 (/lib/x86_64-linux-gnu/libpthread-2.24.so+0x110df)
pdf    | #4 0x7fc01f05bfff gsignal
pdf    | #5 0x7fc01f05d42a abort
pdf    | #6 0x5638d5fa8e95 base::debug::BreakDebugger()
pdf    | #7 0x5638d5f1d132 logging::LogMessage::~LogMessage()
pdf    | #8 0x5638d77f940e service_manager::ZygoteHostImpl::Init()
pdf    | #9 0x5638d5ad5060 content::ContentMainRunnerImpl::Initialize()
pdf    | #10 0x5638d5b365e7 service_manager::Main()
pdf    | #11 0x5638d5ad3631 content::ContentMain()
pdf    | #12 0x5638d5b3580d headless::(anonymous namespace)::RunContentMain()
pdf    | #13 0x5638d5b3550c headless::HeadlessShellMain()
pdf    | #14 0x5638d35295a7 ChromeMain
pdf    | #15 0x7fc01f0492e1 __libc_start_main
pdf    | #16 0x5638d35293ea _start
pdf    |   r8: 0000000000000000  r9: 00007ffcd14664d0 r10: 0000000000000008 r11: 0000000000000246
pdf    |  r12: 00007ffcd1467788 r13: 00007ffcd1466760 r14: 00007ffcd1467790 r15: aaaaaaaaaaaaaaaa
pdf    |   di: 0000000000000002  si: 00007ffcd14664d0  bp: 00007ffcd1466710  bx: 0000000000000006
pdf    |   dx: 0000000000000000  ax: 0000000000000000  cx: 00007fc01f05bfff  sp: 00007ffcd1466548
pdf    |   ip: 00007fc01f05bfff efl: 0000000000000246 cgf: 002b000000000033 erf: 0000000000000000
pdf    |  trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
pdf    | [end of stack trace]
pdf    | Calling _exit(1). Core file will not be generated.
pdf    | 
pdf    | 
pdf    | TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md
pdf    | 
pdf    |     at onClose (/home/node/app/node_modules/puppeteer/lib/launcher/BrowserRunner.js:159:20)
pdf    |     at Interface.<anonymous> (/home/node/app/node_modules/puppeteer/lib/launcher/BrowserRunner.js:149:65)
pdf    |     at Interface.emit (events.js:327:22)
pdf    |     at Interface.close (readline.js:416:8)
pdf    |     at Socket.onend (readline.js:194:10)
pdf    |     at Socket.emit (events.js:327:22)
pdf    |     at endReadableNT (_stream_readable.js:1221:12)
pdf    |     at processTicksAndRejections (internal/process/task_queues.js:84:21)

oh yeah, container name is pdf

I tried looking at puppeteer troubleshooting page as suggested, but i didn't found any solution.

Any suggestions?

like image 828
Tx_monster Avatar asked Jun 12 '20 13:06

Tx_monster


People also ask

Can Docker run Puppeteer?

Puppeteer is a Node. js library which provides a high-level API to control Chromium (or Firefox) browsers over the DevTools Protocol. This guide helps to use Puppeteer inside a Docker container using the Node. js image.

Is Puppeteer headless by default?

Puppeteer is a Node. js library that provides a high-level API to control Chromium or Chrome over the DevTools Protocol. Puppeteer always runs headless by default but can be configured to run full (non-headless) Chrome or Chromium. Puppeteer offers APIs that allow you to control Chrome or Chromium programmatically.

Which is better playwright or Puppeteer?

Its API is also available in multiple programming languages like Java, Python, Typescript, JavaScript, and C#. On the other hand, Puppeteer is a Node library and works only for JavaScript developers. But when in terms of shorter scripts, Puppeteer has a significant advantage over Playwright in terms of shorter scripts.

What does headless mean Puppeteer?

Headless mode is a functionality that allows the execution of a full version of the latest Chrome browser while controlling it programmatically. It can be used on servers without dedicated graphics or display, meaning that it runs without its “head”, the Graphical User Interface (GUI).


3 Answers

You should pass --no-sandbox, --disable-setuid-sandbox args when launch browser. this is my docker file and small script. it's run successfully.

You can know more about puppeteer with docker by this references.

  1. https://github.com/buildkite/docker-puppeteer
  2. https://github.com/alekzonder/docker-puppeteer

Dockerfile


FROM node:12.18.0

RUN  apt-get update \
     && apt-get install -y wget gnupg ca-certificates \
     && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
     && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
     && apt-get update \
     # We install Chrome to get all the OS level dependencies, but Chrome itself
     # is not actually used as it's packaged in the node puppeteer library.
     # Alternatively, we could could include the entire dep list ourselves
     # (https://github.com/puppeteer/puppeteer/blob/master/docs/troubleshooting.md#chrome-headless-doesnt-launch-on-unix)
     # but that seems too easy to get out of date.
     && apt-get install -y google-chrome-stable \
     && rm -rf /var/lib/apt/lists/* \
     && wget --quiet https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh -O /usr/sbin/wait-for-it.sh \
     && chmod +x /usr/sbin/wait-for-it.sh

# Install Puppeteer under /node_modules so it's available system-wide
ADD package.json package-lock.json /
RUN npm install

CMD ["node", "index.js"]

index.js

const puppeteer = require('puppeteer');

(async() => {

    const browser = await puppeteer.launch({
        args: [
            '--no-sandbox',
            '--disable-setuid-sandbox'
        ]
    });

    const page = await browser.newPage();

    await page.goto('https://www.google.com/', {waitUntil: 'networkidle2'});

    browser.close();

})();
like image 159
Ahmed ElMetwally Avatar answered Oct 23 '22 11:10

Ahmed ElMetwally


I found a way that allows the use of chrome sandbox, thanks to usethe4ce's answer in here

Initially i needed to install chrome separately from puppeteer, i edited my Dockerfile as following:

FROM node:12.18.0

WORKDIR /home/runner/app
ADD package*.json ./

# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-unstable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst ttf-freefont \
        --no-install-recommends \
    && rm -rf /var/lib/apt/lists/*

# Uncomment to skip the chromium download when installing puppeteer. If you do,
# you'll need to launch puppeteer with:
#     browser.launch({executablePath: 'google-chrome-unstable'})
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true


# Install node modules
RUN npm i\
    # Add user so we don't need --no-sandbox.
    # same layer as npm install to keep re-chowned files from using up several hundred MBs more space
    && groupadd -r runner && useradd -r -g runner -G audio,video runner \
    && mkdir -p /home/runner/Downloads \
    && chown -R runner:runner /home/runner \
    && chown -R runner:runner /home/runner/app/node_modules

USER runner

CMD ["google-chrome-unstable"]

Doing that, error changed from No usable sandbox to:

Failed to move to new namespace: PID namespaces supported, Network namespace supported, but failed: errno = Operation not permitted

Then i followed usethe4ce's answer advices. Docker by default blocks accessibility to some kernel level operations, Seccomp options allow to "unlock" some of those operations that chrome needs to create his own sandbox. So i added this chrome.json file to my repo, and i edited my docker-compose file as following:

version: "3.8"

services:
  <service name>:
    build:
      <build options>
    init: true
    security_opt: 
      - seccomp=<path to chrome.json file>
    [...]

If you are not using a docker-compose file you can run your container using the option --security-opt seccomp=path/to/chrome.json as suggested in the linked answer.

Finally launch the browser using:

await puppeteer.launch({
  executablePath: 'google-chrome-unstable'
});

Edit:

It is not suitable to use a custom installation of chrome, as its version could not be fully supported by puppeteer. The only version guarantied to work with a specific puppeteer version is the one bundled.

So i suggest using security_opt as above, just ignore the custom installation part.

like image 40
Tx_monster Avatar answered Oct 23 '22 13:10

Tx_monster


I finally found out how to run it with the sandbox but on my local machine only. I just had to read and apply the documentation on the official github repo:

The part I was missing was to run the image with the --cap-add=SYS_ADMIN option:

docker run --cap-add=SYS_ADMIN <YOUR_IMAGE_NAME>

However, this looks like a security flow because it seems to give your container some access to your host. And that's not necessarily what you want to do if you're reading this because you absolutely want to use the Chrome sandbox.

My final usecase is to run my container on Cloud Run, and in no way they are going to allow such a flag in my opinion. I'll edit my answer if I finally get it to work on Cloud Run with the sandbox...

EDIT: Nevermind it just works without any flag on Cloud Run! So yeah, I'll keep using the --cap-add=SYS_ADMIN flag on my development machine, which seems fine to me.


Here is my complete Dockerfile that works for me right now:

FROM node:14-slim

WORKDIR /app

RUN apt-get update \
    && apt-get install -y wget gnupg \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst fonts-freefont-ttf libxss1 \
      --no-install-recommends \
    && rm -rf /var/lib/apt/lists/*

COPY . .

RUN yarn \
    && groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
    && mkdir -p /home/pptruser/Downloads \
    && chown -R pptruser:pptruser /home/pptruser \
    && chown -R pptruser:pptruser /app

# Run everything after as non-privileged user.
USER pptruser

CMD node src/index.js

And my src/index.js file:

const puppeteer = require('puppeteer')

const main = async () => {
  console.log('Starting browser')
  const browser = await puppeteer.launch()
  console.log('Opening a new page')
  const page = await browser.newPage()
  console.log('Navigating to google')
  await page.goto('https://www.google.fr', {
    waitUntil: 'networkidle2'
  })
  console.log('closing browser')
  await browser.close()
}

main()
like image 1
Hammerbot Avatar answered Oct 23 '22 12:10

Hammerbot