Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading YAML with line number for each key

Tags:

ruby

yaml

Let's say I have a YAML file looking like this:

  en:
    errors:
      # Some comment
      format: "%{attribute} %{message}"

      # One more comment
      messages:
        "1": "Message 1"
        "2": "Message 2"

    long_error_message: |
      This is a
      multiline message

    date:
      format: "YYYY-MM-DD"

How can I read this into a Ruby Hash like this?

{
  'en': {
    'errors': {
      'format': { value: '%{attribute} %{message}', line: 4 }
      'messages': {
        '1': { value: 'Message 1', line: 8 },
        '2': { value: 'Message 2', line: 9 }
      }
      'long_error_message' : { value: "This is a\nmultiline message", line: 11 }
    },
    'date': {
      'format': { value: 'YYYY-MM-DD', line: 16 }
    }
  }
}

I've tried using the tip mentioned in YAML: Find line number of key? as a starting point and implemented a Psych::Handler, but it felt like I had to rewrite lots of code from Psych in order to get this to work.

Any ideas how I can solve this?

like image 687
Gerhard Schlager Avatar asked Apr 05 '15 22:04

Gerhard Schlager


2 Answers

I've taken @matt's solution and created a version that requires no mankey-patching. It also handles values that span multiple lines and YAML's << operator.

require "psych"
require "pp"

ValueWithLineNumbers = Struct.new(:value, :lines)

class Psych::Nodes::ScalarWithLineNumber < Psych::Nodes::Scalar
  attr_reader :line_number

  def initialize(*args, line_number)
    super(*args)
    @line_number = line_number
  end
end

class Psych::TreeWithLineNumbersBuilder < Psych::TreeBuilder
  attr_accessor :parser

  def scalar(*args)
    node = Psych::Nodes::ScalarWithLineNumber.new(*args, parser.mark.line)
    @last.children << node
    node
  end
end

class Psych::Visitors::ToRubyWithLineNumbers < Psych::Visitors::ToRuby
  def visit_Psych_Nodes_ScalarWithLineNumber(node)
    visit_Psych_Nodes_Scalar(node)
  end

  private

  def revive_hash(hash, node)
    node.children.each_slice(2) do |k, v|
      key = accept(k)
      val = accept(v)

      if v.is_a? Psych::Nodes::ScalarWithLineNumber
        start_line = end_line = v.line_number + 1

        if k.is_a? Psych::Nodes::ScalarWithLineNumber
          start_line = k.line_number + 1
        end
        val = ValueWithLineNumbers.new(val, start_line..end_line)
      end

      if key == SHOVEL && k.tag != "tag:yaml.org,2002:str"
        case v
        when Psych::Nodes::Alias, Psych::Nodes::Mapping
          begin
            hash.merge! val
          rescue TypeError
            hash[key] = val
          end
        when Psych::Nodes::Sequence
          begin
            h = {}
            val.reverse_each do |value|
              h.merge! value
            end
            hash.merge! h
          rescue TypeError
            hash[key] = val
          end
        else
          hash[key] = val
        end
      else
        hash[key] = val
      end
    end

    hash
  end
end

# Usage:
handler = Psych::TreeWithLineNumbersBuilder.new
handler.parser = Psych::Parser.new(handler)

handler.parser.parse(yaml)

ruby_with_line_numbers = 
Psych::Visitors::ToRubyWithLineNumbers.create.accept(handler.root)

pp ruby_with_line_numbers

I've posted a gist of the above along with some comments and examples

like image 127
John Carney Avatar answered Sep 22 '22 08:09

John Carney


It looks like you want to take any scalar value that is a mapping value and replace it with a hash with a value key containing the original value, and a line key with the line number.

The following nearly works, the main problem being the multiline string where the line number given is the start of the next thing in the Yaml. The problem is that by the time the handler scalar method is called the parser has already moved beyond the scalar of interest, and so mark is giving the line of the position when it knows the scalar has ended. In most cases in your example this doesn’t matter, but with the multiline case it gives the wrong value. I can’t see any way of getting parser info from mark for the beginning of scalars without going into the Psych C code.

require 'psych'

# Psych's first step is to parse the Yaml into an AST of Node objects
# so we open the Node class and add a way to track the line.
class Psych::Nodes::Node
  attr_accessor :line
end

# We need to provide a handler that will add the line to the node
# as it is parsed. TreeBuilder is the "usual" handler, that
# creates the AST.
class LineNumberHandler < Psych::TreeBuilder

  # The handler needs access to the parser in order to call mark
  attr_accessor :parser

  # We are only interested in scalars, so here we override 
  # the method so that it calls mark and adds the line info
  # to the node.
  def scalar value, anchor, tag, plain, quoted, style
    mark = parser.mark
    s = super
    s.line = mark.line
    s
  end
end

# The next step is to convert the AST to a Ruby object.
# Psych does this using the visitor pattern with the ToRuby
# visitor. Here we patch ToRuby rather than inherit from it
# as it makes the last step a little easier.
class Psych::Visitors::ToRuby

  # This is the method for creating hashes. There may be problems
  # with Yaml mappings that have tags.
  def revive_hash hash, o
    o.children.each_slice(2) { |k,v|
      key = accept(k)
      val = accept(v)

      # This is the important bit. If the value is a scalar,
      # we replace it with the desired hash.
      if v.is_a? ::Psych::Nodes::Scalar
        val = { "value" => val, "line" => v.line + 1} # line is 0 based, so + 1
      end

      # Code dealing with << (for merging hashes) omitted.
      # If you need this you will probably need to copy it
      # in here. See the method:
      # https://github.com/tenderlove/psych/blob/v2.0.13/lib/psych/visitors/to_ruby.rb#L333-L365

      hash[key] = val
    }
    hash
  end
end

yaml = get_yaml_from_wherever

# Put it all together    
handler = LineNumberHandler.new
parser =  Psych::Parser.new(handler)
# Provide the handler with a reference to the parser
handler.parser = parser

# The actual parsing
parser.parse yaml
# We patched ToRuby rather than inherit so we can use to_ruby here
puts handler.root.to_ruby
like image 43
matt Avatar answered Sep 23 '22 08:09

matt