I have an XML file that I need to parse. I have no control over the format of the file and cannot change it.
The file makes use of a prefix (call it a
), but it doesn't define a namespace for that prefix anywhere. I can't seem to use xpath
to query for nodes with the a
namespace.
Here's the contents of the xml document
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
<a:thing>stuff2</a:thing>
<a:thing>stuff3</a:thing>
<a:thing>stuff4</a:thing>
<a:thing>stuff5</a:thing>
<a:thing>stuff6</a:thing>
<a:thing>stuff7</a:thing>
<a:thing>stuff8</a:thing>
<a:thing>stuff9</a:thing>
</a:root>
I am using Nokogiri to query the document:
doc = Nokogiri::XML(open('text.xml'))
things = doc.xpath('//a:thing')
The fails giving the following error:
Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //a:thing
From my research, I found out that I could specify the namespace for the prefix in the xpath
method:
things = doc.xpath('//a:thing', a: 'nobody knows')
This returns an empty array.
What would be the best way for me to get the nodes that I need?
The problem is that the namespace is not properly defined in the XML document. As a result, Nokogiri sees the node names as being "a:root" instead of "a" being a namespace and "root" being the node name:
xml = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
</a:root>
}
doc = Nokogiri::XML(xml)
puts doc.at_xpath('*').node_name
#=> "a:root"
puts doc.at_xpath('*').namespace
#=> ""
Solution 1 - Specify node name with colon
One solution is to search for nodes with the name "a:thing". You cannot do //a:thing
since the XPath will treat the "a" as a namespace. You can get around this by doing //*[name()="a:thing"]
:
xml = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
</a:root>
}
doc = Nokogiri::XML(xml)
things = doc.xpath('//*[name()="a:thing"]')
puts things
#=> <a:thing>stuff0</a:thing>
#=> <a:thing>stuff1</a:thing>
Solution 2 - Modify the XML document to define the namespace
An alternative solution is to modify the XML file that you get to properly define the namespace. The document will then behave with namespaces as expected:
xml = %Q{
<?xml version="1.0" encoding="UTF-8"?>
<a:root>
<a:thing>stuff0</a:thing>
<a:thing>stuff1</a:thing>
</a:root>
}
xml.gsub!('<a:root>', '<a:root xmlns:a="foo">')
doc = Nokogiri::XML(xml)
things = doc.xpath('//a:thing')
puts things
#=> <a:thing>stuff0</a:thing>
#=> <a:thing>stuff1</a:thing>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With