I have a soup from BeautifulSoup
that I cannot pickle. When I try to pickle the object the python interpreter silently crashes (such that it cannot be handled as an exception). I have to be able to pickle the object in order to return the object using the multiprocessing
package (which pickles objects to pass them between processes). How can I troubleshoot/work around the problem? Unfortunately, I cannot post the html for the page (it is not publicly available), and I have been unable to find a reproducible example of the problem. I have tried to isolate the problem by looping over the soup and pickling individual components, the smallest thing that produces the error is <class 'BeautifulSoup.NavigableString'>
. When I print the object it prints out u'\n'
.
The class NavigableString
is not serializable with pickle
or cPickle
, which multiprocessing
uses. You should be able to serialize this class with dill
, however. dill
has a superset of the pickle
interface, and can serialize most of python. multiprocessing
will still fail, unless you use a fork of multiprocessing
which uses dill
, called pathos.multiprocessing
.
Get the code here: https://github.com/uqfoundation.
For more information see: What can multiprocessing and dill do together?
http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
http://nbviewer.ipython.org/gist/minrk/5241793
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With