Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selenium can't connect to GhostDriver (but only sometimes)

I've setup a simple webscraping script in Python w/ Selenium and PhantomJS. I've got about 200 URLs in total to scrape. The script runs fine at first then after about 20-30 URLs (it can be more/less as it seems random when it fails and isn't related to any particular URL) I get the following error in python:

selenium.common.exceptions.WebDriverException: Message: 'Can not connect to GhostDriver'

And my ghostdriver.log:

PhantomJS is launching GhostDriver...
[ERROR - 2014-07-04T17:27:37.519Z] GhostDriver - main.fail - {"message":"Could not start Ghost Driver","line":82,"sourceId":140692115795456,"sourceURL":":/ghostdriver/main.js","stack":"Error: Could not start Ghost Driver\n    at :/ghostdriver/main.js:82","stackArray":[{"sourceURL":":/ghostdriver/main.js","line":82}]}

I've searched and most of the questions on SO seem to be that they can't even run a single URL. The only other question I've found where the error occurs at the middle of the script is this one and the answer is to upgrade phantomjs to the latest version, which I've done. The other answer simply says to try that URL again and doesn't seem a good solution since the URL could simply fail again.

I am running phantomjs version 1.9.7 and selenium version 2.42.1 on Linux Mint 17 on python 2.7.6

for url in ['example.com/1/', 'example.com/2/', 'example.com/3/', .. , ..]:
    user_agent = 'Chrome'
    dcap = dict(DesiredCapabilities.PHANTOMJS)
    dcap['phantomjs.page.settings.userAgent'] = user_agent
    driver = webdriver.PhantomJS(executable_path='/usr/bin/phantomjs', desired_capabilities=dcap)
    driver.get(url)
like image 848
user_78361084 Avatar asked Jul 04 '14 17:07

user_78361084


1 Answers

I had the same problem to fix it I installed phantomjs from source.

For Linux (Debian):
sudo apt-get update
sudo apt-get install build-essential chrpath git-core libssl-dev libfontconfig1-dev libxft-dev
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.9
./build.sh

For Mac os:
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.9
./build.sh

For other systems check the following link http://phantomjs.org/build.html

Optional :
cd bin
chmod +x phantomjs
cp phantomjs /usr/bin/

I figured it out because when I read my ghostdriver.log file it said.

[ERROR - 2014-09-04T19:33:30.842Z] GhostDriver - main.fail - {"message":"Could not start Ghost Driver","line":82,"sourceId":140145669488128,"sourceURL":":/ghostdriver/main.js","stack":"Error: Could not start Ghost Driver\n    at :/ghostdriver/main.js:82","stackArray":[{"sourceURL":":/ghostdriver/main.js","line":82}]}

I was sure that there must be some missing files which, it must be using for some edge cases. So I decided to build from source and its working fine now.

like image 165
MaK Avatar answered Oct 20 '22 23:10

MaK