I have an XML structure that looks like the following, but on a much larger scale:
<root> <conference name='1'> <author> Bob </author> <author> Nigel </author> </conference> <conference name='2'> <author> Alice </author> <author> Mary </author> </conference> </root>
For this, I used the following code:
dom = parse(filepath) conference=dom.getElementsByTagName('conference') for node in conference: conf_name=node.getAttribute('name') print conf_name alist=node.getElementsByTagName('author') for a in alist: authortext= a.nodeValue print authortext
However, the authortext that is printed out is 'None.' I tried messing around with using variations like what is below, but it causes my program to break.
authortext=a[0].nodeValue
The correct output should be:
1 Bob Nigel 2 Alice Mary
But what I get is:
1 None None 2 None None
Any suggestions on how to tackle this problem?
There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.
XML stands for eXtensible Markup Language. It was designed to store and transport small to medium amounts of data and is widely used for sharing structured information. Python enables you to parse and modify XML document. In order to parse XML document you need to have the entire XML document in memory.
your authortext
is of type 1 (ELEMENT_NODE
), normally you need to have TEXT_NODE
to get a string. This will work
a.childNodes[0].nodeValue
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With