I was trying to run headless Chrome browser using Selenium to scrape contents from the web. I installed headless Chrome using wget and then unzipped in my current folder.
!wget "http://chromedriver.storage.googleapis.com/2.25/chromedriver_linux64.zip"
!unzip chromedriver_linux64.zip
Now when I am loading the driver
from selenium.webdriver.chrome.options import Options
import os
# instantiate a chrome options object so you can set the size and headless preference
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--window-size=1920x1080")
chrome_driver = os.getcwd() +"/chromedriver"
driver = webdriver.Chrome(chrome_options=chrome_options,executable_path=chrome_driver)
I am getting an error
WebDriverException Traceback (most recent call last)
<ipython-input-67-0aeae0cfd891> in <module>()
----> 1 driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver)
2 driver.get("https://www.google.com")
3 lucky_button = driver.find_element_by_css_selector("[name=btnI]")
4 lucky_button.click()
5 /usr/local/lib/python3.6/dist-packages/selenium/webdriver/chrome/webdriver.py in __init__(self, executable_path, port, chrome_options, service_args, desired_capabilities, service_log_path)
60 service_args=service_args,
61 log_path=service_log_path)
---> 62 self.service.start()
63
64 try:
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py in start(self)
84 count = 0
85 while True:
---> 86 self.assert_process_still_running()
87 if self.is_connectable():
88 break
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py in assert_process_still_running(self)
97 raise WebDriverException(
98 'Service %s unexpectedly exited. Status code was: %s'
---> 99 % (self.path, return_code)
100 )
101
WebDriverException: Message: Service /content/chromedriver unexpectedly exited. Status code was: -6
So after some research I tried the other way
!apt install chromium-chromedriver
import selenium as se
options = se.webdriver.ChromeOptions()
options.add_argument('headless')
driver = se.webdriver.Chrome(chrome_options=options)
On Google Colab which again gives me the same error
WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: -6
I have found the answer to the question about why I was getting an error. Please install the chromium-chromedriver and add it to your path variable as well as the bin directory.
This is the fully-fledged solution to the problem of how to scrape data using Selenium on Colab. There is one more method by using PhantomJS but this API has been deprecated by Selenium and hopefully they will remove it in the next Selenium update.
# install chromium, its driver, and selenium
!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
# set options to be headless, ..
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.website.com")
print(wd.page_source) # results
This would work for anyone who want to scrape their data on Google Colab and not on your local machine. Please execute the steps as shown sequentially in the same order.
You can find the notebook here https://colab.research.google.com/drive/1GFJKhpOju_WLAgiVPCzCGTBVGMkyAjtk .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With