I'm using Ruby (ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]
on my machine, ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux]
in production environment) and Nori to convert an XML document (initially processed with Nokogiri for some validation) into a Ruby Hash, but I later discovered that Nori is dropping the attributes of the deepest XML elements.
To do this, I'm using code similar to the following:
xml = Nokogiri::XML(File.open('file.xml')) { |config| config.strict.noblanks }
hash = Nori.new.parse xml.to_s
The code generally works as intended, except for one case. Whenever Nori parses the XML text, it drops element attributes from the leaf elements (i.e. elements that have no child elements).
For example, the following document:
<?xml version="1.0"?>
<root>
<objects>
<object>
<fields>
<id>1</id>
<name>The name</name>
<description>A description</description>
</fields>
</object>
</objects>
</root>
...is converted to the expected Hash (some output omitted for brevity):
irb(main):066:0> xml = Nokogiri::XML(txt) { |config| config.strict.noblanks }
irb(main):071:0> ap Nori.new.parse(xml.to_s), :indent => -2
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"id" => "1",
"name" => "The name"
"description" => "A description"
}
}
}
}
}
The problem shows up when element attributes are used on elements with no children. For example, the following document is not converted as expected:
<?xml version="1.0"?>
<root>
<objects>
<object id="1">
<fields>
<field name="Name">The name</field>
<field name="Description">A description</field>
</fields>
</object>
</objects>
</root>
The same Nori.new.parse(xml.to_s)
, as displayed by awesome_print
, shows the attributes of the deepest <field>
elements are absent:
irb(main):131:0> ap Nori.new.parse(xml.to_s), :indent => -2
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"field" => [
[0] "The name",
[1] "A description"
]
},
"@id" => "1"
}
}
}
}
The Hash only has their values as a list, which is not what I wanted. I expected the <field>
elements to retain their attributes just like their parent elements (e.g. see @id="1"
for <object>
), not for their attributes to get chopped off.
Even if the document is modified to look as follows, it still doesn't work as expected:
<?xml version="1.0"?>
<root>
<objects>
<object id="1">
<fields>
<Name type="string">The name</Name>
<Description type="string">A description</Description>
</fields>
</object>
</objects>
</root>
It produces the following Hash:
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"Name" => "The name",
"Description" => "A description"
},
"@id" => "1"
}
}
}
}
Which lacks the type="whatever"
attributes for each field entry.
Searching eventually lead me to Issue #59 with the last post (from Aug 2015) stating he can't "find the bug in Nori's code."
So, my question is: Are any of you aware of a way to work around the Nori issue (e.g. perhaps a setting) that would allow me to use my original schema (i.e. the one with attributes in elements with no children)? If so, can you share a code snippet that will handle this correctly?
I had to re-design my XML schema and change code at about three times to make it work, so if there's a way to get Nori to behave, and I'm simply not aware of it, I'd like to know what it is.
I'd like to avoid installing more libraries as much as possible just to get this working properly with the schema structure I originally wanted to use, but I'm open to the possibility if it's proven to work. (I'd have to re-factor the code once again...) Frameworks are definitely overkill for this, so please: do not suggest Ruby on Rails or similar full-stack solutions.
Please note that my current solution, based on a (reluctantly) redesigned schema, is working, but it's more complicated to generate and process than the original one, and I'd like to go back to the simpler/shallower schema.
Nori is not actually dropping the attributes, they are just not being printed.
If you run the ruby script:
require 'nori'
data = Nori.new(empty_tag_value: true).parse(<<XML)
<?xml version="1.0"?>
<root>
<objects>
<object>
<fields>
<field name="Name">The name</field>
<field name="Description">A description</field>
</fields>
</object>
</objects>
</root>
XML
field_list = data['root']['objects']['object']['fields']['field']
puts "text: '#{field_list[0]}' data: #{field_list[0].attributes}"
puts "text: '#{field_list[1]}' data: #{field_list[1].attributes}"
You should get the output
["The name", "A description"]
text: 'The name' data: {"name"=>"Name"}
text: 'A description' data: {"name"=>"Description"}
Which clearly shows that the attribute are there, but are not displayed by the inspect
method (the p(x)
function being the same as puts x.inspect
).
You will notice that puts field_list.inspect
outputs ["The name", "A description"]
. but field_list[0].attributes
prints the attribute key and data.
If you would like to have pp
display this you can overload the inspect
method in the Nori::StringWithAttributes
.
class Nori
class StringWithAttributes < String
def inspect
[attributes, String.new(self)].inspect
end
end
end
Or if you wanted to change the output you could overload the self.new
method to have it return a different data strcture.
class Nori
class MyText < Array
def attributes=(data)
self[1] = data
end
attr_accessor :text
def initialize(text)
self[0] = text
self[1] = {}
end
end
class StringWithAttributes < String
def self.new(x)
MyText.new(x)
end
end
end
And access the data as a tuple
puts "text: '#{data['root']['objects']['object']['fields']['field'][0].first}' data: #{ data['root']['objects']['object']['fields']['field'][0].last}"
This would make it so you could have the data as JSON or YAML as the text items would look like arrays with 2 elements.
pp
also works.
{"root"=>
{"objects"=>
{"object"=>
{"fields"=>
{"field"=>
[["The name", {"name"=>"Name"}],
["A description", {"name"=>"Description"}]]},
"bob"=>[{"@id"=>"id1"}, {"@id"=>"id2"}],
"bill"=>
[{"p"=>["one", {}], "@id"=>"bid1"}, {"p"=>["two", {}], "@id"=>"bid2"}],
"@id"=>"1"}}}}
This should do what you want.
require 'awesome_print'
require 'nori'
# Copyright (c) 2016 G. Allen Morris III
#
# Awesome Print is freely distributable under the terms of MIT license.
# See LICENSE file or http://www.opensource.org/licenses/mit-license.php
#------------------------------------------------------------------------------
module AwesomePrint
module Nori
def self.included(base)
base.send :alias_method, :cast_without_nori, :cast
base.send :alias_method, :cast, :cast_with_nori
end
# Add Nori XML Node and NodeSet names to the dispatcher pipeline.
#-------------------------------------------------------------------
def cast_with_nori(object, type)
cast = cast_without_nori(object, type)
if defined?(::Nori::StringWithAttributes) && object.is_a?(::Nori::StringWithAttributes)
cast = :nori_xml_node
end
cast
end
#-------------------------------------------------------------------
def awesome_nori_xml_node(object)
return %Q|["#{object}", #{object.attributes}]|
end
end
end
AwesomePrint::Formatter.send(:include, AwesomePrint::Nori)
data = Nori.new(empty_tag_value: true).parse(<<XML)
<?xml version="1.0"?>
<root>
<objects>
<object>
<fields>
<field name="Name">The name</field>
<field name="Description">A description</field>
</fields>
</object>
</objects>
</root>
XML
ap data
as the output is:
{
"root" => {
"objects" => {
"object" => {
"fields" => {
"field" => [
[0] ["The name", {"name"=>"Name"}],
[1] ["A description", {"name"=>"Description"}]
]
}
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With