Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignore dates and times while parsing YAML?

Tags:

python

yaml

I'm writing a script to convert a series of YAML files into a single JSON blob. I have a YAML file like this:

---
AWSTemplateFormatVersion: 2010-09-09
Description: AWS CloudFormation ECS Sample
Parameters:
    - SolrCloudInstanceType:
        Type: String
        Description: Solr Cloud EC2 Instance Type
        Default: m3.2xlarge
Resources:
    - ContainerInstance:
        Type: AWS::EC2::Instance
        Properties:
            InstanceType: m3.xlarge

I'm loading it like this

import yaml

with open('base.yml', 'rb') as f:
    result = yaml.safe_load(f)

Interestingly enough, if I inspect the AWSTemplateFormatVersion, I get a Python datetime.date object. This causes my JSON output to fail:

>>> json.dump(result, sys.stdout, sort_keys=True, indent=4)
{
    "AWSTemplateFormatVersion": Traceback (most recent call last):
  File "./c12n-assemble", line 42, in <module>
    __main__()
  File "./c12n-assemble", line 25, in __main__
    assembler.assemble()
  File "./c12n-assemble", line 39, in assemble
    json.dump(self.__result, self.__output_file, sort_keys=True, indent=4, separators=(',', ': '))
  File "/usr/lib/python2.7/json/__init__.py", line 189, in dump
    for chunk in iterable:
  File "/usr/lib/python2.7/json/encoder.py", line 434, in _iterencode
    for chunk in _iterencode_dict(o, _current_indent_level):
  File "/usr/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib/python2.7/json/encoder.py", line 442, in _iterencode
    o = _default(o)
  File "/usr/lib/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: datetime.date(2010, 9, 9) is not JSON serializable

Is there a way to force the YAML parser to not be "smart" about what it considers a date or date+time and just parse a string?

like image 515
Naftuli Kay Avatar asked Jan 07 '16 23:01

Naftuli Kay


People also ask

How to parse a YAML document using PyYaml library?

All right, that’s all when it comes to parsing YAML documents at a low level using the PyYAML library. The corresponding yaml.emit () and yaml.serialize () functions work the other way around by taking a sequence of events or the root node, respectively, and turning them into a YAML representation.

Does Python support YAML data format?

However, Python lacks built-in support for the YAML data format, commonly used for configuration and serialization, despite clear similarities between the two languages. In this tutorial, you’ll learn how to work with YAML in Python using the available third-party libraries, with a focus on PyYAML.

What is the text in YAML?

There’s no inherent text in YAML, only data to represent. YAML was originally meant to simplify Extensible Markup Language (XML), but in reality, it has a lot more in common with JavaScript Object Notation (JSON).

How do I load multiple documents in a YAML file?

Loading Multiple Documents There could be cases where, in a single File there are several YAML documents, and we want to parse all of them. The Yaml class provides a loadAll () method to do such type of parsing. By default, the method returns an instance of Iterable<Object> where each object is of type Map<String, Object>.


2 Answers

You can extend the PyYAML loader and remove the implicit tagging of timestamps, or other types, as follows:

class NoDatesSafeLoader(yaml.SafeLoader):
    @classmethod
    def remove_implicit_resolver(cls, tag_to_remove):
        """
        Remove implicit resolvers for a particular tag

        Takes care not to modify resolvers in super classes.

        We want to load datetimes as strings, not dates, because we
        go on to serialise as json which doesn't have the advanced types
        of yaml, and leads to incompatibilities down the track.
        """
        if not 'yaml_implicit_resolvers' in cls.__dict__:
            cls.yaml_implicit_resolvers = cls.yaml_implicit_resolvers.copy()

        for first_letter, mappings in cls.yaml_implicit_resolvers.items():
            cls.yaml_implicit_resolvers[first_letter] = [(tag, regexp) 
                                                         for tag, regexp in mappings
                                                         if tag != tag_to_remove]

NoDatesSafeLoader.remove_implicit_resolver('tag:yaml.org,2002:timestamp')

Use this alternate loader as follows:

>>> yaml.load("2015-03-22 01:49:21", Loader=NoDatesSafeLoader)
'2015-03-22 01:49:21'

For reference, the original behavior would be:

>>> yaml.load("2015-03-22 01:49:21")
datetime.datetime(2015, 3, 22, 1, 49, 21)
like image 102
Damien Ayers Avatar answered Nov 03 '22 15:11

Damien Ayers


Accepted answer's method is great for a pyyaml based library. In fact, it should be part of pyyaml's BaseResolver class itself. But, for faster and kludgier in-place removal of a particular resolver:

yaml.SafeLoader.yaml_implicit_resolvers = {
    k: [r for r in v if r[0] != 'tag:yaml.org,2002:timestamp'] for
    k, v in yaml.SafeLoader.yaml_implicit_resolvers.items()
}

And then:

>>> yaml.load("2015-03-22 01:49:21", Loader=yaml.SafeLoader)
'2015-03-22 01:49:21'
like image 22
Nuno André Avatar answered Nov 03 '22 14:11

Nuno André