Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup Object Will Not Pickle, Causes Interpreter to Silently Crash

I have a soup from BeautifulSoup that I cannot pickle. When I try to pickle the object the python interpreter silently crashes (such that it cannot be handled as an exception). I have to be able to pickle the object in order to return the object using the multiprocessing package (which pickles objects to pass them between processes). How can I troubleshoot/work around the problem? Unfortunately, I cannot post the html for the page (it is not publicly available), and I have been unable to find a reproducible example of the problem. I have tried to isolate the problem by looping over the soup and pickling individual components, the smallest thing that produces the error is <class 'BeautifulSoup.NavigableString'>. When I print the object it prints out u'\n'.

like image 547
Michael Avatar asked Jul 03 '14 20:07

Michael


1 Answers

The class NavigableString is not serializable with pickle or cPickle, which multiprocessing uses. You should be able to serialize this class with dill, however. dill has a superset of the pickle interface, and can serialize most of python. multiprocessing will still fail, unless you use a fork of multiprocessing which uses dill, called pathos.multiprocessing.

Get the code here: https://github.com/uqfoundation.


For more information see: What can multiprocessing and dill do together?

http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/

http://nbviewer.ipython.org/gist/minrk/5241793

like image 101
Mike McKerns Avatar answered Sep 28 '22 18:09

Mike McKerns