Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pickling Selenium Webdriver Objects

I want to serialize and store a selenium webdriver object so then I could use it later elsewhere in my code. I'm trying to use pickle to do this. If there is another way to save the state of a webdriver object, so I can bring it up again later, that'd be great (I can't just reload the url, since the websites I am looking at are javascript-heavy and the current page depends on what I've clicked on so far).

Currently, I have code like this.

import pickle
from selenium import webdriver

d = webdriver.PhantomJS()
d.get(url)
d.find_element_by_xpath(xpath).click()
p = pickle.dumps(d, pickle.HIGHEST_PROTOCOL)
# Stuff happens here.
new_driver = pickle.loads(p)
print new_driver.page_source.encode('utf-8', 'ignore')

When I run this, I get the following error (the error occurs when I print, not before):

    return self.driver.page_source.encode('utf-8', 'ignore')
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 436, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 163, in execute
    response = self.command_executor.execute(driver_command, params)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
    return self._request(url, method=command_info[0], data=data)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 396, in _request
    response = opener.open(request)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

Is it possible to serialize my webdriver objects? If not, what are my alternatives?

UPDATE:

Upon further inspection, even if I do something like d.get(url) again instead of printing the page source, it gives me the same error. Does something happen to the webdriver object when it is pickled/unpickled?

like image 763
Marcus Gladir Avatar asked Aug 08 '14 16:08

Marcus Gladir


People also ask

How does pickle load work?

The pickle module implements serialization protocol, which provides an ability to save and later load Python objects using special binary format. Unlike json , pickle is not limited to simple objects. It can also store references to functions and classes, as well as the state of class instances.

What is pickle serialization?

What is pickling? Pickle is used for serializing and de-serializing Python object structures, also called marshalling or flattening. Serialization refers to the process of converting an object in memory to a byte stream that can be stored on disk or sent over a network.

How would you save a cookie value in Webdriver?

When the code is executed, WebDriver will store the cookie information using FileWriter Class to write streams of characters and BufferedWriter to write the text into a file named “Cookiefile. data“. The file stores cookie information – “Name, Value, Domain, Path”.


1 Answers

I was able to pickle a selenium.webdriver.Remote object. Neither dill or pickle worked for me to serialize a selenium.webdriver.Chrome object, in which python creates and runs the browser process. However, they both worked if I (1) ran the standalone java selenium2 webserver, (2) in one process, create a selenium.webdriver.Remote connection to that server and pickle/dill that to a file, (3) In another process, unserialize the Remote instance and use it.

This led to being able to close the python process and then re-connect to the existing webdriver browser and issue new commands (could be from a different python script). If I close the selenium web browser then a new instance needs to be created from scratch.

server.py:

import pickle
import selenium.webdriver

EXECUTOR = 'http://127.0.0.1:4444/wd/hub'
FILENAME = '/tmp/pickle'

opt = selenium.webdriver.chrome.options.Options()
capabilities = opt.to_capabilities()
driver = selenium.webdriver.Remote(command_executor=EXECUTOR, desired_capabilities=capabilities)
fp = open(FILENAME, 'wb')
pickle.dump(driver, fp)

client.py:

import pickle

FILENAME = '/tmp/pickle'

driver = pickle.load(open(FILENAME, 'rb')
driver.get('http://www.google.com')
el = driver.find_element_by_id('lst-ib')
print(el)

Note (2020-08-08): Pickling selenium in this way stopped working in the latest selenium (4.x). Pickle fails to pickle an internal socket object. One option is to add a 'selenium=3.141.0' item to the install_requires component in setup.py which still works for me.

like image 196
mathewguest Avatar answered Oct 08 '22 12:10

mathewguest