I need to open a YAML file with aliases used inside it:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: other
This obviously expands out to an equivalent YAML document of:
defaults:
foo: bar
zip: button
node:
foo: other
zip: button
Which YAML::load
reads it as.
I need to set new keys in this YAML document and then write it back out to disk, preserving the original structure as much as possible.
I have looked at YAML::Store, but this completely destroys the aliases and anchors.
Is there anything available that could something along the lines of:
thing = Thing.load("config.yml")
thing[:node][:foo] = "yet another"
Saving the document back as:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: yet another
?
I opted to use YAML for this due to the fact it handles this aliasing well, but writing YAML that contains aliases appears to be a bit of a bleak-looking playing field in reality.
Introducing YAML anchors We define an anchor using the &some_name syntax immediately before the YAML node we want that anchor to point to. We can then use the *some_name syntax later in the YAML to reference that anchor as many times as we want.
YAML Aliases allow you to assign a name to a value or block of data and recall the assigned data by its name in the YAML file. Aliases should work for any file written in YAML.
However, Python lacks built-in support for the YAML data format, commonly used for configuration and serialization, despite clear similarities between the two languages.
YAML (YAML Ain't Markup Language) is a human-readable data-serialization language. It is commonly used for configuration files, but it is also used in data storage (e.g. debugging output) or transmission (e.g. document headers).
The use of <<
to indicate an aliased mapping should be merged in to the current mapping isn’t part of the core Yaml spec, but it is part of the tag repository.
The current Yaml library provided by Ruby – Psych – provides the dump
and load
methods which allow easy serialization and deserialization of Ruby objects and use the various implicit type conversion in the tag repository including <<
to merge hashes. It also provides tools to do more low level Yaml processing if you need it. Unfortunately it doesn’t easily allow selectively disabling or enabling specific parts of the tag repository – it’s an all or nothing affair. In particular the handling of <<
is pretty baked in to the handling of hashes.
One way to achieve what you want is to provide your own subclass of Psych’s ToRuby
class and override this method, so that it just treats mapping keys of <<
as literals. This involves overriding a private method in Psych, so you need to be a little careful:
require 'psych'
class ToRubyNoMerge < Psych::Visitors::ToRuby
def revive_hash hash, o
@st[o.anchor] = hash if o.anchor
o.children.each_slice(2) { |k,v|
key = accept(k)
hash[key] = accept(v)
}
hash
end
end
You would then use it like this:
tree = Psych.parse your_data
data = ToRubyNoMerge.new.accept tree
With the Yaml from your example, data
would then look something like
{"defaults"=>{"foo"=>"bar", "zip"=>"button"},
"node"=>{"<<"=>{"foo"=>"bar", "zip"=>"button"}, "foo"=>"other"}}
Note the <<
as a literal key. Also the hash under the data["defaults"]
key is the same hash as the one under the data["node"]["<<"]
key, i.e. they have the same object_id
. You can now manipulate the data as you want, and when you write it out as Yaml the anchors and aliases will still be in place, although the anchor names will have changed:
data['node']['foo'] = "yet another"
puts Yaml.dump data
produces (Psych uses the object_id
of the hash to ensure unique anchor names (the current version of Psych now uses sequential numbers rather than object_id
)):
---
defaults: &2151922820
foo: bar
zip: button
node:
<<: *2151922820
foo: yet another
If you want to have control over the anchor names, you can provide your own Psych::Visitors::Emitter
. Here’s a simple example based on your example and assuming there’s only the one anchor:
class MyEmitter < Psych::Visitors::Emitter
def visit_Psych_Nodes_Mapping o
o.anchor = 'defaults' if o.anchor
super
end
def visit_Psych_Nodes_Alias o
o.anchor = 'defaults' if o.anchor
super
end
end
When used with the modified data
hash from above:
#create an AST based on the Ruby data structure
builder = Psych::Visitors::YAMLTree.new
builder << data
ast = builder.tree
# write out the tree using the custom emitter
MyEmitter.new($stdout).accept ast
the output is:
---
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: yet another
(Update: another question asked how to do this with more than one anchor, where I came up with a possibly better way to keep anchor names when serializing.)
YAML has aliases and they can round-trip, but you disable it by hash merging. <<
as a mapping key seems a non-standard extension to YAML (both in 1.8's syck and 1.9's psych).
require 'rubygems'
require 'yaml'
yaml = <<EOS
defaults: &defaults
foo: bar
zip: button
node: *defaults
EOS
data = YAML.load yaml
print data.to_yaml
prints
---
defaults: &id001
zip: button
foo: bar
node: *id001
but the <<
in your data merges the aliased hash into a new one which is no longer an alias.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With