Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chrome --headless for AWS Lambda?

Chrome with headless mode already available for linux: https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md

It works only with Canary right now, but it coming official in Chrome 57.

Any chances to run Google Chrome on aws lambda?

like image 259
Sergey Babochkin Avatar asked Mar 07 '17 12:03

Sergey Babochkin


People also ask

How do I run Chrome headless?

As we have already seen, you just have to add the flag –headless when you launch the browser to be in headless mode. With CLI (Command Line Interface), just write: chrome \<br> – headless \ # Runs Chrome in headless mode. <br> – disable-gpu \ # Temporarily needed if running on Windows.

Can I run puppeteer on AWS Lambda?

Puppeteer can now be packaged as a container image in a Lambda function to perform browser automation or any web scraping functionality.

Does Chromium support headless?

Headless Chromium allows running Chromium in an automated environment without a user interface or peripherals. This enables use cases such as automating unit tests with Selenium and converting a web page into a PDF. Headless Chromium is powered by all the modern web platform features provided by Chromium and Blink.

Why use AWS Lambda layers for Chrome headless browser?

Use AWS Lambda Layers to package a Headless Chrome Browser for use in Lambda Functions | Dave Kerr Software Using Lambda Layers you can package a Chrome binary prepared for serverless usage that can be reused across multiple Lambda functions.

How do I make a lambda function with headless browser support?

In order to make future Lambda Functions that also need headless browser support, we will include the headless browser binary and Puppeteer module/API’s in a Lambda Layer. CloudFormation Template Let’s start by writing our CloudFormation template to define the AWS resources that we will be creating.

What is Serverless Framework for Lambda layers?

I created Serverless Framework (≥1.34.0) project to publish and use Lambda Layers with Selenium and Headless Chrome, thus team is able to do UI test using Python without running Selenium on server or local machine. Incompatible versions of serverless-chrome, chromedriver, and Selenium can cause Chrome not reachable error.

What is Lambda in AWS?

The main concept of AWS Lambda functions is running code in response to various events like HTTP requests, changes in file storage, messages from other AWS services, emails and other things happening in the application. In turn, you are billed only for the time taken to execute your function and you never pay for the idle.


1 Answers

Yes; it's possible.

Compiling a non-debug build of Headless Chrome yields a binary that's ~125 MB, and just under 44 MB when gzipped. This means it fits within the 250 MB uncompressed and 50 MB size limitation for the function's deployment package.

What's (currently) required is to force Chrome to compile without using shared memory at /dev/shm. Theres a thread on the topic on the headless-dev google group here.

Here are steps I've used to build a binary of headless Chrome that will work on AWS Lambda. They're based on this and this.

  1. Create a new EC2 instance using the community AMI with name amzn-ami-hvm-2016.03.3.x86_64-gp2 (us-west-2 ami-7172b611).
  2. Pick an Instance Type with at least 16 GB of memory. Compile time will take about 4-5 hours on a t2.xlarge, or 2-3ish on a t2.2xlarge or about 45 min on a c4.4xlarge.
  3. Give yourself a Root Volume that's at least 30 GB (40 GB if you want to compile a debug build—which you won't be able to upload to Lambda because it's too big.)
  4. SSH into the new instance and run:
sudo printf "LANG=en_US.utf-8\nLC_ALL=en_US.utf-8" >> /etc/environment
sudo yum install -y git redhat-lsb python bzip2 tar pkgconfig atk-devel alsa-lib-devel bison binutils brlapi-devel bluez-libs-devel bzip2-devel cairo-devel cups-devel dbus-devel dbus-glib-devel expat-devel fontconfig-devel freetype-devel gcc-c++ GConf2-devel glib2-devel glibc.i686 gperf glib2-devel gtk2-devel gtk3-devel java-1.*.0-openjdk-devel libatomic libcap-devel libffi-devel libgcc.i686 libgnome-keyring-devel libjpeg-devel libstdc++.i686 libX11-devel libXScrnSaver-devel libXtst-devel libxkbcommon-x11-devel ncurses-compat-libs nspr-devel nss-devel pam-devel pango-devel pciutils-devel pulseaudio-libs-devel zlib.i686 httpd mod_ssl php php-cli python-psutil wdiff --enablerepo=epel

Yum will complain about some packages not existing. Whatever. I haven't looked into them. Didn't seem to stop me from building headless_shell, though. Ignore whiney little Yum and move on. Next:

git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
echo "export PATH=$PATH:$HOME/depot_tools" >> ~/.bash_profile
source ~/.bash_profile
mkdir Chromium && cd Chromium
fetch --no-history chromium
cd src

At this point we need to make a very small change to the Chrome code. By default on Linux, Chrome assumes there to be a tmpfs at /dev/shm. There is no tmpfs available to a Lambda function. :-(

The file we have to change is src/base/files/file_util_posix.cc. Modify GetShmemTempDir() such that it always returns the OSs temp dir (/tmp). A simple way to do this is to just remove the entire #if defined(OS_LINUX) block in the GetShmemTempDir() function. A less drastic change is to hardcode use_dev_shm to false:

bool GetShmemTempDir(bool executable, FilePath* path) {
#if defined(OS_LINUX)
  bool use_dev_shm = true;
  if (executable) {
    static const bool s_dev_shm_executable = DetermineDevShmExecutable();
    use_dev_shm = s_dev_shm_executable;
  }

// cuz lambda
use_dev_shm = false; // <-- add this. Yes it's pretty hack-y

  if (use_dev_shm) {
    *path = FilePath("/dev/shm");
    return true;
  }
#endif
  return GetTempDir(path);
}

With that change, it's time to compile. Picking things back up in the src directory, set some compile arguments and then (the last command) start the build process.

mkdir -p out/Headless
echo 'import("//build/args/headless.gn")' > out/Headless/args.gn
echo 'is_debug = false' >> out/Headless/args.gn
echo 'symbol_level = 0' >> out/Headless/args.gn
echo 'is_component_build = false' >> out/Headless/args.gn
echo 'remove_webcore_debug_symbols = true' >> out/Headless/args.gn
echo 'enable_nacl = false' >> out/Headless/args.gn
gn gen out/Headless
ninja -C out/Headless headless_shell

Finally we make a tarball of the relevant file(s) we'll need to run in Lambda.

mkdir out/headless-chrome && cd out
cp Headless/headless_shell Headless/libosmesa.so headless-chrome/
tar -zcvf chrome-headless-lambda-linux-x64.tar.gz headless-chrome/

Within Lambda, run headless_shell with the remote debugger interface enabled by executing:

/path/to/headless_shell --disable-gpu --no-sandbox --remote-debugging-port=9222 --user-data-dir=/tmp/user-data --single-process --data-path=/tmp/data-path --homedir=/tmp --disk-cache-dir=/tmp/cache-dir

Since /tmp is the only writeable place in a Lambda function, there are a bunch of flags just telling Chrome where to dump it's data. They're not necessary but it keeps Chrome happy. Note also that it's been mentioned that with the --disable-gpu flag, we don't need libosmesa.so, the omission of which would shave off about 4 MB from our package zip.

I've started this project with the aim of making it easier to get started. It comes with a pre-built headless Chrome binary which you can get here.

like image 78
Marco Lüthy Avatar answered Oct 08 '22 15:10

Marco Lüthy