I want to serialize and store a selenium webdriver object so then I could use it later elsewhere in my code. I'm trying to use pickle to do this. If there is another way to save the state of a webdriver object, so I can bring it up again later, that'd be great (I can't just reload the url, since the websites I am looking at are javascript-heavy and the current page depends on what I've clicked on so far).
Currently, I have code like this.
import pickle
from selenium import webdriver
d = webdriver.PhantomJS()
d.get(url)
d.find_element_by_xpath(xpath).click()
p = pickle.dumps(d, pickle.HIGHEST_PROTOCOL)
# Stuff happens here.
new_driver = pickle.loads(p)
print new_driver.page_source.encode('utf-8', 'ignore')
When I run this, I get the following error (the error occurs when I print, not before):
return self.driver.page_source.encode('utf-8', 'ignore')
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 436, in page_source
return self.execute(Command.GET_PAGE_SOURCE)['value']
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 163, in execute
response = self.command_executor.execute(driver_command, params)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 349, in execute
return self._request(url, method=command_info[0], data=data)
File "/home/eric/dev/crawler-env/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 396, in _request
response = opener.open(request)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>
Is it possible to serialize my webdriver objects? If not, what are my alternatives?
UPDATE:
Upon further inspection, even if I do something like d.get(url) again instead of printing the page source, it gives me the same error. Does something happen to the webdriver object when it is pickled/unpickled?
The pickle module implements serialization protocol, which provides an ability to save and later load Python objects using special binary format. Unlike json , pickle is not limited to simple objects. It can also store references to functions and classes, as well as the state of class instances.
What is pickling? Pickle is used for serializing and de-serializing Python object structures, also called marshalling or flattening. Serialization refers to the process of converting an object in memory to a byte stream that can be stored on disk or sent over a network.
When the code is executed, WebDriver will store the cookie information using FileWriter Class to write streams of characters and BufferedWriter to write the text into a file named “Cookiefile. data“. The file stores cookie information – “Name, Value, Domain, Path”.
I was able to pickle a selenium.webdriver.Remote object. Neither dill or pickle worked for me to serialize a selenium.webdriver.Chrome object, in which python creates and runs the browser process. However, they both worked if I (1) ran the standalone java selenium2 webserver, (2) in one process, create a selenium.webdriver.Remote connection to that server and pickle/dill that to a file, (3) In another process, unserialize the Remote instance and use it.
This led to being able to close the python process and then re-connect to the existing webdriver browser and issue new commands (could be from a different python script). If I close the selenium web browser then a new instance needs to be created from scratch.
server.py:
import pickle
import selenium.webdriver
EXECUTOR = 'http://127.0.0.1:4444/wd/hub'
FILENAME = '/tmp/pickle'
opt = selenium.webdriver.chrome.options.Options()
capabilities = opt.to_capabilities()
driver = selenium.webdriver.Remote(command_executor=EXECUTOR, desired_capabilities=capabilities)
fp = open(FILENAME, 'wb')
pickle.dump(driver, fp)
client.py:
import pickle
FILENAME = '/tmp/pickle'
driver = pickle.load(open(FILENAME, 'rb')
driver.get('http://www.google.com')
el = driver.find_element_by_id('lst-ib')
print(el)
Note (2020-08-08): Pickling selenium in this way stopped working in the latest selenium (4.x). Pickle fails to pickle an internal socket object. One option is to add a 'selenium=3.141.0' item to the install_requires component in setup.py which still works for me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With