I am defining a Cloud Function on GCP for scraping a website in Python.
I am starting simple by defining a function that simply opens the webdriver:
from selenium import webdriver
def launch_search(request):
# Starting a webdriver
driver = webdriver.Chrome()
return 'Success'
This function doesn't work (Error: could not handle the request
when I trigger it), probably because the Chrome Driver is not installed on my remote machine. Therefore:
In the Source code field, select ZIP from Cloud Storage. In the Cloud Storage location field, click Browse to select a ZIP file from Cloud Storage. Your function source files must be located at the root of the ZIP file - see Source directory structure. Click Deploy.
To deploy remote Selenium webdriver to Google Cloud Run, I follow a classic Docker workflow, but feel free to use this Cloud Build tutorial. First, pull the standalone Chrome Selenium image from Docker Hub. Tag the image with the GCP Container Registry destination (you can also use gcr.io).
With Cloud Functions you write simple, single-purpose functions that are attached to events emitted from your cloud infrastructure and services. Your function is triggered when an event being watched is fired. Your code executes in a fully managed environment.
Given that Cloud Functions are Serverless, you are unable to control the server machine. You can use other manageable services such as GCE or GKE
You can't currently use Python to run Selenium scripts. There's a Feature Request in the Public Issue Tracker currently open for this that can be found here.
As an alternative, you can use Node.JS with Puppeteer. I found this blog post that details a use-case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With