I've spent the last couple of days getting to grips with the basics of lxml; in particular using lxml.html to parse websites and create an ElementTree of the content. Ideally, I want to save the returned ElementTree so that I can load it up and experiment with it, without having to parse the website every time I modify my script. I assumed that pickling would be the way to go, however I'm now beginning to wonder. Although I am able to retrieve an ElementTree object after pickling...
type(myObject)
returns
<class 'lxml.etree._ElementTree'>
the object itself appears to be 'empty', since none of the subsequent method/attribute calls I make on it yield any output.
My guess is that pickling isn't appropriate here, but can anyone suggest an alternative?
(In case it matters, the above is happening in: python3.2, lxml 2.3.2, snow-leopard))
Is lxml safe to use? The python package lxml was scanned for known vulnerabilities and missing license, and no issues were found. Thus the package was deemed as safe to use.
It almost is. lxml is not written in plain Python, because it interfaces with two C libraries: libxml2 and libxslt.
lxml is a C library - libxml to be precise - and the object probably don't support python pickling or any other kind of serialization - except serializing them to XML.
So you'll either have to keep them in memory, or re-parse the XML fragments you need, I assume.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With