Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python lxml the E-Factory

I've been using the lxml "E-Factory" (aka. ElementMaker) for creating xml documents. I'm trying to generate an xml document similar to this:

<url>
  <date-added>2011-11-11</date-added>
</url>

However, using the E-factory, I'm not sure how to specify the dash in the 'data-added' element. It seems to be interpreting the dash as a minus sign.

Here is the docs I've been referring to: http://lxml.de/tutorial.html#the-e-factory

Here is how to reproduce the error:

from lxml import etree
from lxml.builder import ElementMaker 

E = ElementMaker()
URL = E.url
DATE_ADDED = E.date-added

xml = URL(DATE_ADDED(myobject.created.strftime('%Y-%m-%dT%H:%M:%S')),)


NameError global name 'added' is not defined

Does anyone know a trick to get it do properly render the element with a dash?

Thank you for reading this.

Joe

like image 880
Joe J Avatar asked Nov 09 '11 06:11

Joe J


2 Answers

Explanation: What you put after E. needs to be a valid Python identifier. This includes underscores but not hyphens. E.date-added is compiled "successfully" as if it were (E.date) - added but then fails at run time because (in your case) added was not defined.

Alternatives:

(1) The E.tag is just a cosmetic trick that doesn't work with all legal XML tags. In reality Python object attributes can be just about any old rubbish, you just can't do obj.really+funky%attribute*name,dude in source code. One dud trick deserves a better trick: You can keep the same pattern of element creation i.e. don't need to specify the tag every time you create an element by doing:

DATE_ADDED = getattr(E, 'date-added')

and then using DATE_ADDED as you do now.

(2) If the schema is under your control, use underscore (date_added) instead of hyphen(date-added).

like image 138
John Machin Avatar answered Sep 22 '22 09:09

John Machin


The ElementMaker maps a function to a tag name (by using e.g. E.date_added) to build up the XML tree. However, there is a discrepancy between the allowed characters in HTML/XML tags and Python functions. As stated in PEP 8: "Package and Module Names Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability". So, the date_added function includes an underscore, which isn't allowed to be present in a Python function:

>>> def foo-bar():
  File "<stdin>", line 1
    def foo-bar():
           ^
SyntaxError: invalid syntax

To resolve it, just create the date-added tag a bit more verbosely by supplying the name as an argument instead:

>>> etree.tostring(E.url(E('date-added', '2011-11-11')))
'<url><date-added>2011-11-11</date-added></url>'
like image 41
jro Avatar answered Sep 22 '22 09:09

jro