I've setup a simple webscraping script in Python w/ Selenium and PhantomJS. I've got about 200 URLs in total to scrape. The script runs fine at first then after about 20-30 URLs (it can be more/less as it seems random when it fails and isn't related to any particular URL) I get the following error in python:
selenium.common.exceptions.WebDriverException: Message: 'Can not connect to GhostDriver'
And my ghostdriver.log:
PhantomJS is launching GhostDriver...
[ERROR - 2014-07-04T17:27:37.519Z] GhostDriver - main.fail - {"message":"Could not start Ghost Driver","line":82,"sourceId":140692115795456,"sourceURL":":/ghostdriver/main.js","stack":"Error: Could not start Ghost Driver\n at :/ghostdriver/main.js:82","stackArray":[{"sourceURL":":/ghostdriver/main.js","line":82}]}
I've searched and most of the questions on SO seem to be that they can't even run a single URL. The only other question I've found where the error occurs at the middle of the script is this one and the answer is to upgrade phantomjs to the latest version, which I've done. The other answer simply says to try that URL again and doesn't seem a good solution since the URL could simply fail again.
I am running phantomjs version 1.9.7 and selenium version 2.42.1 on Linux Mint 17 on python 2.7.6
for url in ['example.com/1/', 'example.com/2/', 'example.com/3/', .. , ..]:
user_agent = 'Chrome'
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap['phantomjs.page.settings.userAgent'] = user_agent
driver = webdriver.PhantomJS(executable_path='/usr/bin/phantomjs', desired_capabilities=dcap)
driver.get(url)
I had the same problem to fix it I installed phantomjs from source.
For Linux (Debian):
sudo apt-get update
sudo apt-get install build-essential chrpath git-core libssl-dev libfontconfig1-dev libxft-dev
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.9
./build.sh
For Mac os:
git clone git://github.com/ariya/phantomjs.git
cd phantomjs
git checkout 1.9
./build.sh
For other systems check the following link http://phantomjs.org/build.html
Optional :
cd bin
chmod +x phantomjs
cp phantomjs /usr/bin/
I figured it out because when I read my ghostdriver.log file it said.
[ERROR - 2014-09-04T19:33:30.842Z] GhostDriver - main.fail - {"message":"Could not start Ghost Driver","line":82,"sourceId":140145669488128,"sourceURL":":/ghostdriver/main.js","stack":"Error: Could not start Ghost Driver\n at :/ghostdriver/main.js:82","stackArray":[{"sourceURL":":/ghostdriver/main.js","line":82}]}
I was sure that there must be some missing files which, it must be using for some edge cases. So I decided to build from source and its working fine now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With