I'm trying to link peroumal1's "docker-chrome-selenium" container to another container with scraping code that uses Selenium.
He exposes his container to port 4444 (the default for Selenium), but I'm having trouble accessing it from my scraper container. Here's my docker-compose
file:
chromedriver:
image: eperoumalnaik/docker-chrome-selenium:latest
scraper:
build: .
command: python manage.py scrapy crawl general_course_content
volumes:
- .:/code
ports:
- "8000:8000"
links:
- chromedriver
and here's my scraper Dockerfile:
FROM python:2.7
RUN mkdir /code
WORKDIR /code
ADD requirements.txt /code/
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
ADD . /code/
When I try to use Selenium from my code (see below), however, I get the following error message: selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be available in the path. Please look at http://docs.seleniumhq.org/download/#thirdPartyDrivers and read up at http://code.google.com/p/selenium/wiki/ChromeDriver
. On Mac OS X, when I wasn't using Docker, I fixed this by downloading the chromedriver
binary and adding it to the path, but I don't know what to do here.
driver = webdriver.Chrome()
driver.maximize_window()
driver.get('http://google.com')
driver.close()
Edit: I'm also trying to do this with Selenium's official images and, unfortunately, it's not working either (the same error message asking for the chromedriver binary appears).
Is there something that needs to be done on the Python code?
Thank you!
Update: As @peroumal1 said, the problem was that I wasn't connecting to a remote driver using Selenium. After I did, however, I had connectivity problems (urllib2.URLError: <urlopen error [Errno 111] Connection refused>
) until I modified the IP address that the Selenium driver connects to (when using boot2docker
, you have to connect to the virtual machine's IP instead of your computer's localhost, which you can find by typing boot2docker ip
) and changed the docker-compose
file. This is what I ended up with:
chromedriver:
image: selenium/standalone-chrome
ports:
- "4444:4444"
scraper:
build: .
command: python manage.py scrapy crawl general_course_content
volumes:
- .:/code
ports:
- 8000:8000
links:
- chromedriver
And the Python code (boot2docker
's IP address on my computer is 192.168.59.103
):
driver = webdriver.Remote(
command_executor='http://192.168.59.103:4444/wd/hub',
desired_capabilities=DesiredCapabilities.CHROME)
driver.maximize_window()
driver.get('http://google.com')
driver.close()
I think the issue here might not Docker, but the code. The Selenium images provide a interface to a Selenium Server through remote Webdriver, and the code provided tries to directly instantiate a Chrome browser using chromedriver, a thing that is possible with Selenium Python bindings, provided that chromedriver is accessible from the environment.
Maybe it would work better using the example from the docs :
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver = webdriver.Remote(
command_executor='http://127.0.0.1:4444/wd/hub',
desired_capabilities=DesiredCapabilities.CHROME)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With