Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Invalid tag name" error when creating element with lxml in python

I am using lxml to make an xml file and my sample program is :

from lxml import etree
import datetime
dt=datetime.datetime(2013,11,30,4,5,6)
dt=dt.strftime('%Y-%m-%d')
page=etree.Element('html')
doc=etree.ElementTree(page)
dateElm=etree.SubElement(page,dt)
outfile=open('somefile.xml','w')
doc.write(outfile)

And I am getting the following error output :

dateElm=etree.SubElement(page,dt)
  File "lxml.etree.pyx", line 2899, in lxml.etree.SubElement (src/lxml/lxml.etree.c:62284)
  File "apihelpers.pxi", line 171, in lxml.etree._makeSubElement (src/lxml/lxml.etree.c:14296)
  File "apihelpers.pxi", line 1523, in lxml.etree._tagValidOrRaise (src/lxml/lxml.etree.c:26852)
ValueError: Invalid tag name u'2013-11-30'

I thought it of a Unicode Error, so tried changing encoding of 'dt' with codes like

  1. str(dt)
  2. unicode(dt).encode('unicode_escape')
  3. dt.encocde('ascii','ignore')
  4. dt.encode('ascii','decode')

and some others also, but none worked and same error msg generated.

like image 936
Chandra kant Avatar asked Nov 30 '13 16:11

Chandra kant


People also ask

Is lxml included in Python?

lxml has been downloaded from the Python Package Index millions of times and is also available directly in many package distributions, e.g. for Linux or macOS.

Which of the following is invalid Tagname starts with?

A tag name may not start with a period or a dash and may contain a maximum of 128 characters.

What does the lxml parser do?

lxml provides a very simple and powerful API for parsing XML and HTML. It supports one-step parsing as well as step-by-step parsing using an event-driven API (currently only for XML).


1 Answers

You get the error because element names are not allowed to begin with a digit in XML. See http://www.w3.org/TR/xml/#sec-common-syn and http://www.w3.org/TR/xml/#sec-starttags. The first character of a name must be a NameStartChar, which disallows digits.

An element such as <2013-11-30>...</2013-11-30> is invalid.

An element such as <D2013-11-30>...</D2013-11-30> is OK.

If your program is changed to use ElementTree instead of lxml (from xml.etree import ElementTree as etree instead of from lxml import etree), there is no error. But I would consider that a bug. lxml does the right thing, ElementTree does not.

like image 90
mzjn Avatar answered Nov 02 '22 00:11

mzjn