How to use selenium
with firefox
to scrape websites?
echo "deb http://packages.linuxmint.com debian import" >> /etc/apt/sources.list && apt-get update
apt-get install firefox xvfb python-dev python-pip
pip install pyvirtualdisplay selenium
from pyvirtualdisplay import Display
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
display = Display(visible=0, size=(800, 600))
display.start()
def init_driver():
driver = webdriver.Firefox()
driver.wait = WebDriverWait(driver, 5)
return driver
def lookup(driver, query):
driver.get("http://www.google.com")
try:
box = driver.wait.until(EC.presence_of_element_located(
(By.NAME, "q")))
button = driver.wait.until(EC.element_to_be_clickable(
(By.NAME, "btnK")))
box.send_keys(query)
button.click()
except TimeoutException:
print("Box or Button not found in google.com")
if __name__ == "__main__":
driver = init_driver()
lookup(driver, "Selenium")
time.sleep(5)
driver.quit()
display.stop()
File "selenium_scrape.py", line 20
box = driver.wait.until(EC.presence_of_element_located(
^
IndentationError: expected an indented block
Headless Execution Firefox Driver It is quite simple to run your tests in the headless mode. You need simply to add the "--headless" argument to the FirefoxOptions object.
Firefox in headless mode, can be run once we configure the geckodriver path. We shall then use the FirefoxOptions class, and send the headless knowledge to the browser with setHeadless method and pass true as a parameter to it.
Mozilla Firefox is one of the most widely used browsers in the world. It has enhanced features and is supported by a multitude of the latest testing tools and techniques. One such tool is Selenium.
How to make Firefox headless programmatically in Selenium with Python? To make Firefox headless programmatically in Selenium with Python, we can set the headless property to True . to create an Options object. And we set the headless property of it to True .
The difference is that you cannot use a packaged Chrome browser; you need a special driver... chromedriver.
Get the current latest version here: Chromedriver
Now you have 2 options, either to move the downloaded chromedriver so it is always accessible (option 1), or to define in your script how to access it.
Then move it so it is accessible when you use webdriver.Chrome()
:
sudo mv /path/to/download/chromedriver /usr/bin
Also set it to be allowed to be executed:
chmod a+x /usr/binchromedriver
Or you can define a path
import os
chr = "/Users/you/Downloads/chromedriver"
os.environ["webdriver.chrome.driver"] = chr
driver = webdriver.Chrome(chromedriver)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With