Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

saving an 'lxml.etree._ElementTree' object

I've spent the last couple of days getting to grips with the basics of lxml; in particular using lxml.html to parse websites and create an ElementTree of the content. Ideally, I want to save the returned ElementTree so that I can load it up and experiment with it, without having to parse the website every time I modify my script. I assumed that pickling would be the way to go, however I'm now beginning to wonder. Although I am able to retrieve an ElementTree object after pickling...

type(myObject) 

returns

<class 'lxml.etree._ElementTree'>

the object itself appears to be 'empty', since none of the subsequent method/attribute calls I make on it yield any output.

My guess is that pickling isn't appropriate here, but can anyone suggest an alternative?

(In case it matters, the above is happening in: python3.2, lxml 2.3.2, snow-leopard))

like image 264
Paul Patterson Avatar asked Nov 25 '11 21:11

Paul Patterson


People also ask

Is lxml secure?

Is lxml safe to use? The python package lxml was scanned for known vulnerabilities and missing license, and no issues were found. Thus the package was deemed as safe to use.

Is lxml in Python standard library?

It almost is. lxml is not written in plain Python, because it interfaces with two C libraries: libxml2 and libxslt.


1 Answers

lxml is a C library - libxml to be precise - and the object probably don't support python pickling or any other kind of serialization - except serializing them to XML.

So you'll either have to keep them in memory, or re-parse the XML fragments you need, I assume.

like image 165
Has QUIT--Anony-Mousse Avatar answered Oct 02 '22 20:10

Has QUIT--Anony-Mousse