Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

YAML data exchange issues between Perl and Ruby

Tags:

ruby

yaml

perl

I'm having trouble with data being exchanged between Perl and Ruby via YAML. I have some values that look like number:number, such as 1:16.

Perl's YAML libraries (Tiny and XS) encode this as 1:16 without quotes. Ruby's YAML library (Psych) does not interpret this as a string, but instead somehow becomes the Fixnum value 4560. I can't figure out how to fix this conversion issue on either side.

Every value in the YAML for my use case should be an object or string. So, I could tell the Perl YAML library to quote all values, if such an option existed. Or is there any way to tell the Ruby YAML library to interpret all values as strings? Any ideas?

Changing the language on either side is not logistically an option.

Perl:

use YAML::XS qw(DumpFile);
my $foo={'abc'=>'1:16'};
DumpFile('test.yaml',$foo);

Ruby:

require('yaml')
foo=YAML.load_file('test.yaml')
puts(foo['abc'])

The Ruby code will print 4560. One of the comments figured out how you get 4560 from 1:16, it's 1 hour, 16 minutes converted to seconds. Uh, okay.

like image 544
Douglas Mauch Avatar asked Sep 26 '12 20:09

Douglas Mauch


2 Answers

According to the Yaml 1.1 spec, 1:16 is an integer in sexagesimal (base 60) format.

See also http://yaml.org/type/int.html, which says:

Using “:” allows expressing integers in base 60, which is convenient for time and angle values.

The Yaml parser included in Ruby, Psych, recognises this format and converts the value into an integer (wrongly, 1:16 shoud be 71 – the Psych code seems to asume that all such values will be in the form a:b:c but the regex doesn’t enforce that). The Perl emitter (at least YAML::XS which I tested) doesn’t recognise this format, so doesn’t quote the string when writing the file. YAML::XS does recognise and quote some integers, but not all. YAML::XS also doesn’t recognise many other formats (e.g. dates) that Psych does.

(It appears that the sexagesimal format has been removed from the Yaml 1.2 spec.)

Psych allows quite a deal of flexibility in its parsing – YAML.load_file is just a simple interface for the common use cases.

You could use the parse methods of Psych to create a tree representation of the yaml, then convert this into a Ruby data structure using a custom ScalarScanner (which is the object that converts strings of certain formats to the appropriate Ruby type):

require('yaml')

class MyScalarScanner < Psych::ScalarScanner
  def tokenize string
    #this is the same regexp as Psych uses to detect base 60 ints:
    return string if string =~ /^[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+$/
    super
  end
end

tree = YAML::parse_file 'test.yaml'
foo = Psych::Visitors::ToRuby.new(MyScalarScanner.new).accept tree

This is basically the same process that occurs when you use YAML.load_file, except that it uses the customised scanner class.

A similar alternative would be to open up ScalarScanner and replace the tokenize method with the customised one. This would allow you to use the simpler load_file interface, but with the usual caveats about monkey patching classes:

class Psych::ScalarScanner
  alias :orig_tokenize :tokenize
  def tokenize string
    return string if string =~ /^[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+$/
    orig_tokenize string
  end
end

foo = YAML.load_file 'test.yaml'

Note that these examples only take into consideration values with a format like 1:16. Depending on what your Perl program is emitting you may need to override other patterns too. One in particular that you might want to look at is sexagesimal floats (e.g. 1:16.44).

like image 193
matt Avatar answered Oct 23 '22 01:10

matt


There's a bug in the parser you are using. It seems to think 1:16 is some kind of time (since 4560 is the number of seconds in one hour and 16 minutes), but I find nothing that validates that interpretation.

The best solution would be to use a parser that isn't buggy.

  • libyaml, used by YAML::XS, supposedly has Ruby bindings.
  • libsyck, used by YAML::Syck, supposedly has Ruby bindings.

An alternative is to generate YAML where the strings are always quoted (or at least when they would be treated as as time).

YAML::Syck has an option to do exactly that.

$ perl -e'
   use YAML::Syck qw( Dump );
   local $YAML::Syck::SingleQuote = 1;
   print(Dump({abc=>"1:16"}));
'
--- 
"abc": '1:16'

(Don't know how I missed this option earlier!)

like image 30
ikegami Avatar answered Oct 23 '22 00:10

ikegami