Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running selenium webdriver in amazon lambda python

I want to run BeautifulSoup and selenium webdriver in amazon lambda and my running environment is python 3.6. Is it possible to run ? if so How. My intention is to scrap datas from a webpage using beautiful soup 4 and selenium(Since it has to scrap data dynamically generated by javascript).

like image 578
skysoft999 Avatar asked Apr 21 '18 07:04

skysoft999


People also ask

Can I run Selenium in AWS Lambda?

The easiest method is to use SAM CLI for Docker for Lambda to create an image with Selenium, Chrome / Chromium headless and webdriver, but given the way Lambda restricts the environment making it work on Selenium is quite difficult but not impossible. In this tutorial I will provide a guide on how to do exactly that.

Is multithreading possible in AWS Lambda?

Using multithreading in AWS Lambda can speed up your Lambda execution and reduce cost as Lambda charges in 100 ms unit.


2 Answers

Yes, it's possible. You need to package a headless Chrome binary and chromedriver along with all the Python packages you need. You'll also need to set several options in Selenium's Chrome web driver to make it work.

I wrote a step-by-step tutorial after spending several frustrating weeks trying to deploy it.

like image 174
robroc Avatar answered Sep 30 '22 17:09

robroc


You will need to create a deployment package and upload it to Lambda if you are going to use dependancies outside of the standard library.

I have a write up about using BS4 and Lambda together. I did not use Selenium within Lambda but I do have extensive Selenium experience. You will not be able to execute commands within a browser using Lambda. You are going to need to have a remote server stood up, running Selenium Server. Download Selenium and the webdrivers on the machine that you wish to do the web scraping, start the .jar file, it will open a port on the machine Selenium will communicate with.

Considering that you will need a machine running probably windows to fire up a browser and scrape these pages, you probably don't need lambda in the end.

like image 41
Nicholas Martinez Avatar answered Sep 30 '22 18:09

Nicholas Martinez