Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting xml into a native Ruby data structure

Tags:

xml

ruby

I'm grabbing data from an api that is returning xml like this:

<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>

I'm new to deserialization but what I think is appropriate is to parse this xml into a ruby object that I can then reference like objectFoo.seriess.series.frequency that would return 'Quarterly'.

From my searches here and on google there doesn't seem to be an obvious solution to this in Ruby (NOT rails) which makes me think I'm missing something rather obvious. Any ideas?

Edit I setup a test case based upon Winfield's suggestion.

class Exopenstruct

  require 'ostruct'

  def initialize()  

  hash = {"seriess"=>{"realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "series"=>{"id"=>"GDPC1", "realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "title"=>"Real Gross Domestic Product, 1 Decimal", "observation_start"=>"1947-01-01", "observation_end"=>"2012-10-01", "frequency"=>"Quarterly", "frequency_short"=>"Q", "units"=>"Billions of Chained 2005 Dollars", "units_short"=>"Bil. of Chn. 2005 $", "seasonal_adjustment"=>"Seasonally Adjusted Annual Rate", "seasonal_adjustment_short"=>"SAAR", "last_updated"=>"2013-01-30 07:46:54-06", "popularity"=>"93", "notes"=>"Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States.\n\nFor more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"}}}

  object_instance = OpenStruct.new( hash )

  end
end

In irb I loaded the rb file and instantiated the class. However, when I tried to access an attribute (e.g. instance.seriess) I received: NoMethodError: undefined method `seriess'

Again apologies if I'm missing something obvious.

like image 611
JohnGalt Avatar asked Nov 29 '22 13:11

JohnGalt


2 Answers

You may be better off using standard XML to Hash parsing, such as included with Rails:

object_hash = Hash.from_xml(xml_string)
puts object_hash['seriess']

If you aren't using a Rails stack, you can use a library like Nokogiri for the same behavior.

EDIT: If you're looking for object behavior, using OpenStruct is a great way to wrap the hash for this:

object_instance = OpenStruct.new( Hash.from_xml(xml_string) )
puts object_instance.seriess

NOTE: For deeply nested data, you may need to recursively convert embedded hashes into OpenStruct instances as well. IE: if attribute above is a hash of values, it will be a hash and not an OpenStruct.

like image 103
Winfield Avatar answered Dec 24 '22 17:12

Winfield


I've just started using Damien Le Berrigaud's fork of HappyMapper and I'm really pleased with it. You define simple Ruby classes and include HappyMapper. When you call parse, it uses Nokogiri to slurp in the XML and you get back a complete tree of bona-fide Ruby objects.

I've used it to parse multi-megabyte XML files and found it to be fast and dependable. Check out the README.

One hint: since XML file encoding strings sometimes lie, you may need to sanitize your XML like this:

def sanitize(xml)
  xml.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end

before passing it to the #parse method in order to avoid Nokogiri's Input is not proper UTF-8, indicate encoding ! error.

update

I went ahead and cast the OP's example into HappyMapper:

XML_STRING = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'

class Series; end;              # fwd reference

class Seriess
  include HappyMapper
  tag 'seriess'

  attribute :realtime_start, Date
  attribute :realtime_end, Date
  has_many :seriess, Series, :tag => 'series'
end
class Series
  include HappyMapper
  tag 'series'

  attribute 'id', String
  attribute 'realtime_start', Date
  attribute 'realtime_end', Date
  attribute 'title', String
  attribute 'observation_start', Date
  attribute 'observation_end', Date
  attribute 'frequency', String
  attribute 'frequency_short', String
  attribute 'units', String
  attribute 'units_short', String
  attribute 'seasonal_adjustment', String
  attribute 'seasonal_adjustment_short', String
  attribute 'last_updated', DateTime
  attribute 'popularity', Integer
  attribute 'notes', String
end

def test
  Seriess.parse(XML_STRING, :single => true)
end

and here's what you can do with it:

>> a = test
>> a.class
Seriess
>> a.seriess.first.frequency
=> "Quarterly"
>> a.seriess.first.observation_start
=> #<Date: 1947-01-01 ((2432187j,0s,0n),+0s,2299161j)>
>> a.seriess.first.popularity
=> 93
like image 27
fearless_fool Avatar answered Dec 24 '22 17:12

fearless_fool