Let's suppose we have the XML file with the structure as follows.
<?xml version="1.0" ?>
<searchRetrieveResponse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/zing/srw/ http://www.loc.gov/standards/sru/sru1-1archive/xml-files/srw-types.xsd" xmlns="http://www.loc.gov/zing/srw/">
<records xmlns:ns1="http://www.loc.gov/zing/srw/">
<record>
<recordData>
<record xmlns="">
<datafield tag="000">
<subfield code="a">123</subfield>
<subfield code="b">456</subfield>
</datafield>
<datafield tag="001">
<subfield code="a">789</subfield>
<subfield code="b">987</subfield>
</datafield>
</record>
</recordData>
</record>
<record>
<recordData>
<record xmlns="">
<datafield tag="000">
<subfield code="a">123</subfield>
<subfield code="b">456</subfield>
</datafield>
<datafield tag="001">
<subfield code="a">789</subfield>
<subfield code="b">987</subfield>
</datafield>
</record>
</recordData>
</record>
</records>
</searchRetrieveResponse>
I need to parse out:
I wonder how to do that using lxml and XPath. Pasted below is my initial code and I kindly ask someone to explain me, how to parse out values.
import urllib, urllib2
from lxml import etree
url = "https://dl.dropbox.com/u/540963/short_test.xml"
fp = urllib2.urlopen(url)
doc = etree.parse(fp)
fp.close()
ns = {'xsi':'http://www.loc.gov/zing/srw/'}
for record in doc.xpath('//xsi:record', namespaces=ns):
print record.xpath("xsi:recordData/record/datafield[@tag='000']", namespaces=ns)
I would be more direct in your XPath: go straight for the elements you want, in this case datafield
.
>>> for df in doc.xpath('//datafield'):
# Iterate over attributes of datafield
for attrib_name in df.attrib:
print '@' + attrib_name + '=' + df.attrib[attrib_name]
# subfield is a child of datafield, and iterate
subfields = df.getchildren()
for subfield in subfields:
print 'subfield=' + subfield.text
Also, lxml appears to let you ignore the namespace, maybe because your example only uses one namespace?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With