Python YAML to JSON to YAML

Tags:

I'm new to python so I am building a simple program to parse YAML to JSON and JSON to YAML.

The yaml2json converts YAML to JSON on a single line, but a JSON validator says it is correct.

This is my code so far:

def parseyaml(inFileType, outFileType):
   infile = input('Please enter a {} filename to parse: '.format(inFileType))
   outfile = input('Please enter a {} filename to output: '.format(outFileType))

   with open(infile, 'r') as stream:
       try:
           datamap = yaml.safe_load(stream)
           with open(outfile, 'w') as output:
               json.dump(datamap, output)
       except yaml.YAMLError as exc:
           print(exc)

    print('Your file has been parsed.\n\n')


def parsejson(inFileType, outFileType):
   infile = input('Please enter a {} filename to parse: '.format(inFileType))
   outfile = input('Please enter a {} filename to output: '.format(outFileType))

   with open(infile, 'r') as stream:
       try:
           datamap = json.load(stream)
           with open(outfile, 'w') as output:
               yaml.dump(datamap, output)
       except yaml.YAMLError as exc:
           print(exc)

   print('Your file has been parsed.\n\n')

An example of the original YAML vs. the new YAML

Original:

inputs:
  webTierCpu:
    type: integer
    minimum: 2
    default: 2
    maximum: 5
    title: Web Server CPU Count
    description: The number of CPUs for the Web nodes

New:

inputs:
  dbTierCpu: {default: 2, description: The number of CPUs for the DB node, maximum: 5,
    minimum: 2, title: DB Server CPU Count, type: integer}

It doesn't look like its decoding all of the JSON so I'm not sure where I should go next...

570

asked Aug 19 '18 04:08

Tom B.

2 Answers

Your file is losing its formatting because the original dump routine by default writes all leaf nodes in YAML flow-style, whereas your input is block style all the way.

You are also losing the order of the keys, which is first because the JSON parser uses dict, and second because dump sorts the output.

If you look at your intermediate JSON you already see that the key order is gone at that point. To preserve that, use the new API to load your YAML and have a special JSON encoder as a replacement for dump that can handle the subclasses of Mapping in which the YAML is loaded similar to this example from the standard Python doc.

Assuming your YAML is stored in input.yaml:

import sys
import json
from collections.abc import Mapping, Sequence
from collections import OrderedDict
import ruamel.yaml

# if you instantiate a YAML instance as yaml, you have to explicitly import the error
from ruamel.yaml.error import YAMLError


yaml = ruamel.yaml.YAML()  # this uses the new API
# if you have standard indentation, no need to use the following
yaml.indent(sequence=4, offset=2)

input_file = 'input.yaml'
intermediate_file = 'intermediate.json'
output_file = 'output.yaml'


class OrderlyJSONEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, Mapping):
            return OrderedDict(o)
        elif isinstance(o, Sequence):
            return list(o)
        return json.JSONEncoder.default(self, o)


def yaml_2_json(in_file, out_file):
    with open(in_file, 'r') as stream:
        try:
            datamap = yaml.load(stream)
            with open(out_file, 'w') as output:
                output.write(OrderlyJSONEncoder(indent=2).encode(datamap))
        except YAMLError as exc:
            print(exc)
            return False
    return True


yaml_2_json(input_file, intermediate_file)
with open(intermediate_file) as fp:
    sys.stdout.write(fp.read())

which gives:

{
  "inputs": {
    "webTierCpu": {
      "type": "integer",
      "minimum": 2,
      "default": 2,
      "maximum": 5,
      "title": "Web Server CPU Count",
      "description": "The number of CPUs for the Web nodes"
    }
  }
}

You see that your JSON has the appropriate key order, which we also need to preserve on loading. You can do that without subclassing anything, by specifying the loading of JSON objects into the subclass of Mapping, that the YAML parser is using internally, by providingobject_pairs_hook.

from ruamel.yaml.comments import CommentedMap


def json_2_yaml(in_file, out_file):
    with open(in_file, 'r') as stream:
        try:
            datamap = json.load(stream, object_pairs_hook=CommentedMap)
            # if you need to "restore" literal style scalars, etc.
            # walk_tree(datamap)
            with open(out_file, 'w') as output:
                yaml.dump(datamap, output)
        except yaml.YAMLError as exc:
            print(exc)
            return False
    return True


json_2_yaml(intermediate_file, output_file)
with open(output_file) as fp:
    sys.stdout.write(fp.read())

Which outputs:

inputs:
  webTierCpu:
    type: integer
    minimum: 2
    default: 2
    maximum: 5
    title: Web Server CPU Count
    description: The number of CPUs for the Web nodes

And I hope that that is similar enough to your original input to be acceptable.

Notes:

When using the new API I tend to use yaml as the name of the instance of ruamel.yaml.YAML(), instead of from ruamel import yaml. That however masks the use of yaml.YAMLError because the error class is not an attribute of YAML()
If you are developing this kind of stuff, I can recommend removing at least the user input from the actual functionality. It should be trivial to write your parseyaml and parsejson to call yaml_2_json resp. json_2_yaml.
Any comments in your original YAML file will be lost, although ruamel.yaml can load them. JSON originally did allow comments, but it is not in the specification and no parsers that I know can output comments.

Since your real file has literal block scalars you have to use some magic to get those back.

Include the following functions that walk a tree, recursing into dict values and list elements and converting any line with an embedded newline to a type that gets output to YAML as a literal blocks style scalar in place (hence no return value):

from ruamel.yaml.scalarstring import PreservedScalarString, SingleQuotedScalarString
from ruamel.yaml.compat import string_types, MutableMapping, MutableSequence

def preserve_literal(s):
    return PreservedScalarString(s.replace('\r\n', '\n').replace('\r', '\n'))

def walk_tree(base):
    if isinstance(base, MutableMapping):
        for k in base:
            v = base[k]  # type: Text
            if isinstance(v, string_types):
                if '\n' in v:
                    base[k] = preserve_literal(v)
                elif '${' in v or ':' in v:
                    base[k] = SingleQuotedScalarString(v)
            else:
                walk_tree(v)
    elif isinstance(base, MutableSequence):
        for idx, elem in enumerate(base):
            if isinstance(elem, string_types):
                if '\n' in elem:
                    base[idx] = preserve_literal(elem)
                elif '${' in elem or ':' in elem:
                    base[idx] = SingleQuotedScalarString(elem)
            else:
                walk_tree(elem)

And then do

    walk_tree(datamap)

after you load the data from JSON.

With all of the above you should have only one line that differs in your Wordpress.yaml file.

answered Oct 21 '22 21:10

Anthon

function yaml_validate {
  python -c 'import sys, yaml, json; yaml.safe_load(sys.stdin.read())'
}

function yaml2json {
  python -c 'import sys, yaml, json; print(json.dumps(yaml.safe_load(sys.stdin.read())))'
}

function yaml2json_pretty {
  python -c 'import sys, yaml, json; print(json.dumps(yaml.safe_load(sys.stdin.read()), indent=2, sort_keys=False))'
}

function json_validate {
  python -c 'import sys, yaml, json; json.loads(sys.stdin.read())'
}

function json2yaml {
  python -c 'import sys, yaml, json; print(yaml.dump(json.loads(sys.stdin.read())))'
}

More useful Bash tricks at http://github.com/frgomes/bash-scripts

answered Oct 21 '22 22:10

Richard Gomes

Related questions
                            
                                iterable from pandas dataframe
                            
                                python list to csv file with each item in new line
                            
                                How do I reverse the strings contained in each pair of matching parentheses, starting from the innermost pair? CodeFights
                            
                                Shift rows of a numpy array independently
                            
                                How to integrate Stripe payments gateway with Django Oscar?
                            
                                What's the best way of centre cropping images in python?
                            
                                WindowsContext: OleInitialize() failed: "COM error 0x80010106 RPC_E_CHANGED_MODE (Unknown error 0x0ffffffff80010106)"
                            
                                Remove wavy noise from image background using OpenCV
                            
                                Efficient 2D cross correlation in Python?
                            
                                multiprocessing gives AssertionError: daemonic processes are not allowed to have children
                            
                                Extracting names from a text file using Spacy
                            
                                Can One Replace or Remove a specific key from functools.lru_cache?
                            
                                Why are there two hashes in my Pipfile.lock for one module?
                            
                                Engines available for to_excel function in pandas
                            
                                Given a value, find percentile % with Numpy
                            
                                What is the reason that mock.patch ignores a fully imported function?
                            
                                pandas ValueError: transforms cannot produce aggregated results
                            
                                Python, reading a zip file comment
                            
                                Tensorflow: How to tile a tensor that duplicate in certain order? [duplicate]
                            
                                How to set attribute on an object given a dotted path?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python YAML to JSON to YAML

Tags:

python

json

yaml

Tom B.

People also ask

2 Answers

Anthon

Richard Gomes

Recent Activity

Donate For Us