Let's say I have a YAML file looking like this:
en: errors: # Some comment format: "%{attribute} %{message}" # One more comment messages: "1": "Message 1" "2": "Message 2" long_error_message: | This is a multiline message date: format: "YYYY-MM-DD"
How can I read this into a Ruby Hash
like this?
{
'en': {
'errors': {
'format': { value: '%{attribute} %{message}', line: 4 }
'messages': {
'1': { value: 'Message 1', line: 8 },
'2': { value: 'Message 2', line: 9 }
}
'long_error_message' : { value: "This is a\nmultiline message", line: 11 }
},
'date': {
'format': { value: 'YYYY-MM-DD', line: 16 }
}
}
}
I've tried using the tip mentioned in YAML: Find line number of key? as a starting point and implemented a Psych::Handler
, but it felt like I had to rewrite lots of code from Psych in order to get this to work.
Any ideas how I can solve this?
I've taken @matt's solution and created a version that requires no mankey-patching. It also handles values that span multiple lines and YAML's <<
operator.
require "psych"
require "pp"
ValueWithLineNumbers = Struct.new(:value, :lines)
class Psych::Nodes::ScalarWithLineNumber < Psych::Nodes::Scalar
attr_reader :line_number
def initialize(*args, line_number)
super(*args)
@line_number = line_number
end
end
class Psych::TreeWithLineNumbersBuilder < Psych::TreeBuilder
attr_accessor :parser
def scalar(*args)
node = Psych::Nodes::ScalarWithLineNumber.new(*args, parser.mark.line)
@last.children << node
node
end
end
class Psych::Visitors::ToRubyWithLineNumbers < Psych::Visitors::ToRuby
def visit_Psych_Nodes_ScalarWithLineNumber(node)
visit_Psych_Nodes_Scalar(node)
end
private
def revive_hash(hash, node)
node.children.each_slice(2) do |k, v|
key = accept(k)
val = accept(v)
if v.is_a? Psych::Nodes::ScalarWithLineNumber
start_line = end_line = v.line_number + 1
if k.is_a? Psych::Nodes::ScalarWithLineNumber
start_line = k.line_number + 1
end
val = ValueWithLineNumbers.new(val, start_line..end_line)
end
if key == SHOVEL && k.tag != "tag:yaml.org,2002:str"
case v
when Psych::Nodes::Alias, Psych::Nodes::Mapping
begin
hash.merge! val
rescue TypeError
hash[key] = val
end
when Psych::Nodes::Sequence
begin
h = {}
val.reverse_each do |value|
h.merge! value
end
hash.merge! h
rescue TypeError
hash[key] = val
end
else
hash[key] = val
end
else
hash[key] = val
end
end
hash
end
end
# Usage:
handler = Psych::TreeWithLineNumbersBuilder.new
handler.parser = Psych::Parser.new(handler)
handler.parser.parse(yaml)
ruby_with_line_numbers =
Psych::Visitors::ToRubyWithLineNumbers.create.accept(handler.root)
pp ruby_with_line_numbers
I've posted a gist of the above along with some comments and examples
It looks like you want to take any scalar value that is a mapping value and replace it with a hash with a value
key containing the original value, and a line
key with the line number.
The following nearly works, the main problem being the multiline string where the line number given is the start of the next thing in the Yaml. The problem is that by the time the handler scalar
method is called the parser has already moved beyond the scalar of interest, and so mark
is giving the line of the position when it knows the scalar has ended. In most cases in your example this doesn’t matter, but with the multiline case it gives the wrong value. I can’t see any way of getting parser info from mark
for the beginning of scalars without going into the Psych C code.
require 'psych'
# Psych's first step is to parse the Yaml into an AST of Node objects
# so we open the Node class and add a way to track the line.
class Psych::Nodes::Node
attr_accessor :line
end
# We need to provide a handler that will add the line to the node
# as it is parsed. TreeBuilder is the "usual" handler, that
# creates the AST.
class LineNumberHandler < Psych::TreeBuilder
# The handler needs access to the parser in order to call mark
attr_accessor :parser
# We are only interested in scalars, so here we override
# the method so that it calls mark and adds the line info
# to the node.
def scalar value, anchor, tag, plain, quoted, style
mark = parser.mark
s = super
s.line = mark.line
s
end
end
# The next step is to convert the AST to a Ruby object.
# Psych does this using the visitor pattern with the ToRuby
# visitor. Here we patch ToRuby rather than inherit from it
# as it makes the last step a little easier.
class Psych::Visitors::ToRuby
# This is the method for creating hashes. There may be problems
# with Yaml mappings that have tags.
def revive_hash hash, o
o.children.each_slice(2) { |k,v|
key = accept(k)
val = accept(v)
# This is the important bit. If the value is a scalar,
# we replace it with the desired hash.
if v.is_a? ::Psych::Nodes::Scalar
val = { "value" => val, "line" => v.line + 1} # line is 0 based, so + 1
end
# Code dealing with << (for merging hashes) omitted.
# If you need this you will probably need to copy it
# in here. See the method:
# https://github.com/tenderlove/psych/blob/v2.0.13/lib/psych/visitors/to_ruby.rb#L333-L365
hash[key] = val
}
hash
end
end
yaml = get_yaml_from_wherever
# Put it all together
handler = LineNumberHandler.new
parser = Psych::Parser.new(handler)
# Provide the handler with a reference to the parser
handler.parser = parser
# The actual parsing
parser.parse yaml
# We patched ToRuby rather than inherit so we can use to_ruby here
puts handler.root.to_ruby
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With