Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Cloud Platform - Deploy a Cloud Function that starts a webdriver

I am defining a Cloud Function on GCP for scraping a website in Python.

I am starting simple by defining a function that simply opens the webdriver:

from selenium import webdriver

def launch_search(request):
    # Starting a webdriver
    driver = webdriver.Chrome()
    return 'Success'

This function doesn't work (Error: could not handle the request when I trigger it), probably because the Chrome Driver is not installed on my remote machine. Therefore:

  • How can I install it?
  • Or can I scrape a webpage using Selenium, without opening the page with a webdriver?
like image 241
ludmilaex Avatar asked Oct 03 '19 10:10

ludmilaex


People also ask

How do I deploy a cloud function in GCP?

In the Source code field, select ZIP from Cloud Storage. In the Cloud Storage location field, click Browse to select a ZIP file from Cloud Storage. Your function source files must be located at the root of the ZIP file - see Source directory structure. Click Deploy.

Can I run Selenium on GCP?

To deploy remote Selenium webdriver to Google Cloud Run, I follow a classic Docker workflow, but feel free to use this Cloud Build tutorial. First, pull the standalone Chrome Selenium image from Docker Hub. Tag the image with the GCP Container Registry destination (you can also use gcr.io).

How do you use the cloud function?

With Cloud Functions you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. Your function is triggered when an event being watched is fired. Your code executes in a fully managed environment.


2 Answers

Given that Cloud Functions are Serverless, you are unable to control the server machine. You can use other manageable services such as GCE or GKE

like image 105
manasouza Avatar answered Oct 16 '22 12:10

manasouza


You can't currently use Python to run Selenium scripts. There's a Feature Request in the Public Issue Tracker currently open for this that can be found here.

As an alternative, you can use Node.JS with Puppeteer. I found this blog post that details a use-case.

like image 36
Maxim Avatar answered Oct 16 '22 12:10

Maxim