I scraping many info from web, and I hope it works on cloud. So I'd like to use colaboratory, but it turned error
WebDriverException Traceback (most recent call last)
<ipython-input-35-abcc3b93dfa7> in <module>()
20 options.add_argument("--start-maximized");
21 options.add_argument("--headless");
---> 22 driver = webdriver.Chrome('chromedriver', chrome_options=options)
23
24 book = cd + "/target.xlsx"
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/chrome/webdriver.py in __init__(self, executable_path, port, options, service_args, desired_capabilities, service_log_path, chrome_options, keep_alive)
71 service_args=service_args,
72 log_path=service_log_path)
---> 73 self.service.start()
74
75 try:
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py in start(self)
96 count = 0
97 while True:
---> 98 self.assert_process_still_running()
99 if self.is_connectable():
100 break
/usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py in assert_process_still_running(self)
109 raise WebDriverException(
110 'Service %s unexpectedly exited. Status code was: %s'
--> 111 % (self.path, return_code)
112 )
113
WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: -6
I read the articles, and it says this works. How can we use Selenium Webdriver in colab.research.google.com? But actually not.
Any Ideas are appreciated.
My option is
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome('chromedriver', chrome_options=options)
↑ this last sentence makes error
WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: -6
============================================ My entire chart
!sudo apt install unzip
!wget https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip
!unzip chromedriver_linux64.zip -d /usr/bin/
from google.colab import drive
drive.mount('/content/drive')
!pip install selenium
!pip install openpyxl
then, python script is
cd = "drive/My Drive/doc/業務資料/イーコレ/scrape/*"
import os, subprocess
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
import selenium
import bs4
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup
import openpyxl
import time, re, csv, urllib.parse
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome('chromedriver', chrome_options=options)
This is where using Google Colab to run your Selenium program can help you greatly. It helps you accomplish the task at hand while freeing your local resources to complete another task. Colab also uses Google's internet connection to run requests and fetch data.
# install chromium, its driver, and selenium
!apt update
!apt install chromium-chromedriver
!pip install selenium
# set options to be headless, ..
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# open it, go to a website, and get results
wd = webdriver.Chrome('chromedriver',options=options)
wd.get("https://www.website.com")
print(wd.page_source) # results
I wrap this all into a library
!pip install kora
from kora.selenium import wd
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With