Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Headless script crashes after a few runs

I have a script using a headless browser which I'm running using crontab -e. It runs fine the first few times and then crashes with the following Traceback:

Traceback (most recent call last):
  File "/home/clint-selenium-firefox.py", line 83, in <module>
    driver.get(url)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 248, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Failed to decode response from marionette

My crontab line is:

*/10 * * * * export DISPLAY=:0 && python /home/clint-selenium-firefox.py >> /home/error.log 2>&1

I don't want to overload this with the python script so I've pulled out what I think are the relevant bits.

from pyvirtualdisplay import Display

display = Display(visible=0, size=(800, 600))
display.start()
...
driver = webdriver.Firefox()
driver.get(url)
...
driver.quit()
...
display.stop()

Your help is much appreciated.

EDIT

Versions: Firefox 49.0.2; Selenium : 3.0.1; geckodriver: geckodriver-v0.11.1-linux64.tar.gz

Code around error (failing on driver.get(url)):

driver = webdriver.Firefox()
if DEBUG: print "Opened Firefox"

for u in urls:
    list_of_rows = []
    list_of_old_rows = []

    # get the old version of the site data
    mycsvfile = u[1]
    try:
        with open(mycsvfile, 'r') as csvfile:
            old_data = csv.reader(csvfile, delimiter=' ', quotechar='|')
            for o in old_data:
                list_of_old_rows.append(o)
    except: pass

    # get the new data
    url = u[0]
    if DEBUG: print url    

    driver.get(url)
    if DEBUG: print driver.title
    time.sleep(1)
    page_source = driver.page_source
    soup = bs4.BeautifulSoup(page_source,'html.parser')  
like image 764
HenryM Avatar asked Nov 15 '16 14:11

HenryM


1 Answers

From Multiple Firefox instances failing with NS_ERROR_SOCKET_ADDRESS_IN_USE #99 This is because no --marionette-port option is passed to geckodriver - which means all instances of geckodriver launch firefox passing the same desired default port (2828). The first firefox instance binds to that port, future instances can't and all the geckodriver instances end up connecting to the first firefox instance - which produces all sorts of unpredictable behavior.

Followed by: I think a reasonable short-term solution is to do what the other drivers are doing and ask Marionette to bind to a randomised, free port generated by geckodriver. Currently it uses 2828 as the default for all instances it spawns of Firefox. Since Marionette unfortunately does not yet have an out-of-band way of communicating the port back to the client (geckodriver), this is inherently racy but we can improve the situation in the future with one of the proposals from bug 1240830.

This change was made in

Selenium 3.0.0.b2
* Updated Marionette port argument to match other drivers.

I guess random only works for so long. Raise an issue. A code fix may be required for the versions of selenium, firefox and geckodriver that you have. You could drop back to using Selenium 2.53.0 and firefox esr 38.8 until this is fixed. Your call.

UPDATE: Try

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

binary = FirefoxBinary('path/to/binary')
driver = webdriver.Firefox(firefox_binary=binary)
like image 103
MikeJRamsey56 Avatar answered Oct 13 '22 00:10

MikeJRamsey56