Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Python, how do I refer to an XML tag that contains a hyphen

Tags:

python

xml

I'm trying to use BeautifulSoup to parse an XML file. One of the elements has a hyphen in it: distribution-code

How do I access it? I've tried:

soup.distribution-code
soup."distribution-code" (tried single quotes too)
soup.[distribution-code]

but none of these work.

like image 993
OutThere Avatar asked Oct 21 '15 03:10

OutThere


People also ask

Can XML tags have hyphens?

XML Naming Rules Element names must start with a letter or underscore. Element names cannot start with the letters xml (or XML, or Xml, etc) Element names can contain letters, digits, hyphens, underscores, and periods. Element names cannot contain spaces.

What are tags in XML?

The XML tags are case sensitive i.e. <root> and <Root> both tags are different. The XML tags are used to define the scope of elements in XML document. Property of XML Tags: There are many property of XML tags which are discussed below: Every XML document must have a root tag which enclose the XML document.

Can XML element name start with a number?

XML elements must follow these naming rules: Names can contain letters, numbers, and other characters. Names cannot start with a number or punctuation character.


1 Answers

You can access non-hyphenated elements by attribute reference using regular Python syntax, i.e. obj.name, however, - is not a valid character when using that syntax (Python treats it as the "minus" operator), hence you can not access such elements by that method.

Instead, use soup.find() or soup.find_all():

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<thing><id>1234</id><distribution-code>555444333</distribution-code></thing>')
>>> soup.thing
<thing><id>1234</id><distribution-code>555444333</distribution-code></thing>
>>> soup.id
<id>1234</id>
>>> soup.distribution-code
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'code' is not defined
>>> soup.find('distribution-code')
<distribution-code>555444333</distribution-code>

Or, as pointed out in chepner's comment, you can use getattr() and setattr() to get and set attributes that contain hyphens. I think that soup.find() is the more common method for accessing those elements.

like image 187
mhawke Avatar answered Sep 28 '22 06:09

mhawke