Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

YAML preprocessor / macro processor

Is there a simple way to use a preproccesor / macro-processor with YAML files? (I.e. I'm thinking of something along the lines of the C preprocessor)?

We have a lot of flat text-files that describes various data structures. They're currently in our own in-house format, and are read with an in-house parser. I'd like to switch to YAML files to make use of the various pre-existing libraries for reading and writing.

However our files are hierarchical, in that we "include" master files into sub files, and using variable substitution generate new data structures.

As a toy example I'd want something like:

country_master.yaml

name: $COUNTRY$
file: C:\data\$COUNTRY$

UK_country.yaml

#define $COUNTRY$ UK
#include <country_master.yaml>

USA_country.yaml

#define $COUNTRY$ USA
#include <country_master.yaml>

Then after preprocessing we'd get something like:

name: USA
file: C:\data\USA

The C-preprocessor won't work with the # character used in YAML comments. Also, ideally we'd like to have loops which are expanded by the preprocessor, so in the above example we'd create UK and USA together with a loop (and I don't believe you can loop with cpp).

Any ideas?

like image 705
Justin Avatar asked May 19 '15 14:05

Justin


2 Answers

# Yamp - YAML Macro-Processor
# https://github.com/birchb1024/yamp

# in master.yaml
defmacro:
  name: country
  args: [$COUNTRY$]
  value:
    name: $COUNTRY$
    file: C:\data\{{$COUNTRY$}}
---
# in some file
- include: [master.yaml]

# Call with wherever needed:
{ country: USA }
like image 53
Bill Birch Avatar answered Nov 04 '22 19:11

Bill Birch


You are trying to change things on the level of the string representation of YAML, and I think you shouldn't. YAML can load objects, and those objects can influence later elements loaded, by hooking into the parser. That way you can replace complete nodes with data, change values within scalars, etc.

Let's assume you have this YAML file main.yml:

- !YAMLPreProcessor
  verbose: '3'
  escape: ♦
- ♦replace(verbose)
- abcd
- ♦include(xyz.yml)
- xyz

and that xyz.yml is:

k: 9
l: 8
m: [7. 6]   # can be either

and you have as special character (it could be anything as long as YAMLPreProcessor value for special matches the start of the action keyword (replace and include). You want this to be round-tripped (loaded into data in memory and then dumped to the following YAML:

- !YAMLPreProcessor
  verbose: '3'
  escape: ♦
- '3'
- abcd
- k: 9
  l: 8
  m: [7. 6] # can be either
- xyz

You can do that by overloading the scalar constructor that gets called for each scalar and an appropriate YAMLPreProcessor class:

# coding: utf-8

from __future__ import print_function

import ruamel.yaml as yaml

def construct_scalar(loader, node):
    self = getattr(loader, '_yaml_preprocessor', None)
    if self and self.d.get('escape'):
        if node.value and node.value.startswith(self.d['escape']):
            key_word, rest = node.value[1:].split('(', 1)
            args, rest = rest.split(')', 1)
            if key_word == 'replace':
                res = u''
                for arg in args.split(','):
                    res += str(self.d[arg])
                node.value = res + rest
            elif key_word == 'include':
                inc_yml = yaml.load(
                    open(args),
                    Loader=yaml.RoundTripLoader
                )
                # this needs ruamel.yaml>=0.9.6
                return inc_yml
            else:
                print('keyword not found:', key_word)
    ret_val = loader._org_construct_scalar(node)
    # print('ret_val', type(ret_val), ret_val)
    return ret_val

class YAMLPreProcessor:
    def __init__(self, escape=None, verbose=0):
        self.d = dict(escape=escape, verbose=verbose)

    def __repr__(self):
        return "YAMLPreProcessor({escape!r}, {verbose})".format(**self.d)

    @staticmethod
    def __yaml_out__(dumper, self):
        return dumper.represent_mapping('!YAMLPreProcessor', self.d)

    @staticmethod
    def __yaml_in__(loader, data):
        from ruamel.yaml.comments import CommentedMap
        result = YAMLPreProcessor()
        loader._yaml_preprocessor = result
        z = dict()
        loader.construct_mapping(data, z)
        result.d = z
        yield result

    def __delete__(self):
        loader._yaml_preprocessor = None



def construct_yaml_str(self, node):
    value = self.construct_scalar(node)
    if isinstance(value, ScalarString):
        return value
    if PY3:
        return value
    try:
        return value.encode('ascii')
    except AttributeError:
        # in case you replace the node dynamically e.g. with a dict
        return value
    except UnicodeEncodeError:
        return value


loader = yaml.RoundTripLoader

loader.add_constructor('!YAMLPreProcessor', YAMLPreProcessor.__yaml_in__)
loader._org_construct_scalar = loader.construct_scalar
loader.construct_scalar = construct_scalar

data_from_yaml = yaml.load(open('main.yml'), Loader=loader)

#print ('out', data_from_yaml)

dumper = yaml.RoundTripDumper
# need to be able to represent '!YAMLPreProcessor'
# but you can of course also remove the first element
# from data_from_yaml if you don't want the preprocessor in your output
dumper.add_representer(YAMLPreProcessor, YAMLPreProcessor.__yaml_out__)

print(yaml.dump(data_from_yaml, Dumper=dumper, allow_unicode=True))

The above needs a recent version of ruamel.yaml (0.9.6) as older versions choke if construct_scalar returns a non-string object.

Please note that the position of the comment behind the line with the m key is relative to the start of the line, and in the example there is no compensation for the indent level of the node where the xyz.yml file is inserted.

like image 1
Anthon Avatar answered Nov 04 '22 20:11

Anthon