Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Docker: using container with headless Selenium Chromedriver

I'm trying to link peroumal1's "docker-chrome-selenium" container to another container with scraping code that uses Selenium.

He exposes his container to port 4444 (the default for Selenium), but I'm having trouble accessing it from my scraper container. Here's my docker-compose file:

chromedriver:
  image: eperoumalnaik/docker-chrome-selenium:latest

scraper:
  build: .
  command: python manage.py scrapy crawl general_course_content
  volumes:
    - .:/code
  ports:
    - "8000:8000"
  links:
    - chromedriver

and here's my scraper Dockerfile:

FROM python:2.7

RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/

RUN pip install --upgrade pip
RUN pip install -r requirements.txt
ADD . /code/

When I try to use Selenium from my code (see below), however, I get the following error message: selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be available in the path. Please look at http://docs.seleniumhq.org/download/#thirdPartyDrivers and read up at http://code.google.com/p/selenium/wiki/ChromeDriver. On Mac OS X, when I wasn't using Docker, I fixed this by downloading the chromedriver binary and adding it to the path, but I don't know what to do here.

driver = webdriver.Chrome()
driver.maximize_window()
driver.get('http://google.com')
driver.close()

Edit: I'm also trying to do this with Selenium's official images and, unfortunately, it's not working either (the same error message asking for the chromedriver binary appears).

Is there something that needs to be done on the Python code?

Thank you!

Update: As @peroumal1 said, the problem was that I wasn't connecting to a remote driver using Selenium. After I did, however, I had connectivity problems (urllib2.URLError: <urlopen error [Errno 111] Connection refused>) until I modified the IP address that the Selenium driver connects to (when using boot2docker, you have to connect to the virtual machine's IP instead of your computer's localhost, which you can find by typing boot2docker ip) and changed the docker-compose file. This is what I ended up with:

chromedriver:
  image: selenium/standalone-chrome
  ports:
    - "4444:4444"

scraper:
  build: .
  command: python manage.py scrapy crawl general_course_content
  volumes:
    - .:/code
  ports:
    - 8000:8000
  links:
    - chromedriver

And the Python code (boot2docker's IP address on my computer is 192.168.59.103):

driver = webdriver.Remote(
           command_executor='http://192.168.59.103:4444/wd/hub',
           desired_capabilities=DesiredCapabilities.CHROME)
driver.maximize_window()
driver.get('http://google.com')
driver.close()
like image 713
aralar Avatar asked Apr 21 '15 19:04

aralar


1 Answers

I think the issue here might not Docker, but the code. The Selenium images provide a interface to a Selenium Server through remote Webdriver, and the code provided tries to directly instantiate a Chrome browser using chromedriver, a thing that is possible with Selenium Python bindings, provided that chromedriver is accessible from the environment.

Maybe it would work better using the example from the docs :

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

driver = webdriver.Remote(
command_executor='http://127.0.0.1:4444/wd/hub',
desired_capabilities=DesiredCapabilities.CHROME)
like image 105
peroumal1 Avatar answered Sep 18 '22 09:09

peroumal1