I'm writing a script to convert a series of YAML files into a single JSON blob. I have a YAML file like this:
---
AWSTemplateFormatVersion: 2010-09-09
Description: AWS CloudFormation ECS Sample
Parameters:
- SolrCloudInstanceType:
Type: String
Description: Solr Cloud EC2 Instance Type
Default: m3.2xlarge
Resources:
- ContainerInstance:
Type: AWS::EC2::Instance
Properties:
InstanceType: m3.xlarge
I'm loading it like this
import yaml
with open('base.yml', 'rb') as f:
result = yaml.safe_load(f)
Interestingly enough, if I inspect the AWSTemplateFormatVersion
, I get a Python datetime.date
object. This causes my JSON output to fail:
>>> json.dump(result, sys.stdout, sort_keys=True, indent=4)
{
"AWSTemplateFormatVersion": Traceback (most recent call last):
File "./c12n-assemble", line 42, in <module>
__main__()
File "./c12n-assemble", line 25, in __main__
assembler.assemble()
File "./c12n-assemble", line 39, in assemble
json.dump(self.__result, self.__output_file, sort_keys=True, indent=4, separators=(',', ': '))
File "/usr/lib/python2.7/json/__init__.py", line 189, in dump
for chunk in iterable:
File "/usr/lib/python2.7/json/encoder.py", line 434, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "/usr/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict
for chunk in chunks:
File "/usr/lib/python2.7/json/encoder.py", line 442, in _iterencode
o = _default(o)
File "/usr/lib/python2.7/json/encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: datetime.date(2010, 9, 9) is not JSON serializable
Is there a way to force the YAML parser to not be "smart" about what it considers a date or date+time and just parse a string?
All right, that’s all when it comes to parsing YAML documents at a low level using the PyYAML library. The corresponding yaml.emit () and yaml.serialize () functions work the other way around by taking a sequence of events or the root node, respectively, and turning them into a YAML representation.
However, Python lacks built-in support for the YAML data format, commonly used for configuration and serialization, despite clear similarities between the two languages. In this tutorial, you’ll learn how to work with YAML in Python using the available third-party libraries, with a focus on PyYAML.
There’s no inherent text in YAML, only data to represent. YAML was originally meant to simplify Extensible Markup Language (XML), but in reality, it has a lot more in common with JavaScript Object Notation (JSON).
Loading Multiple Documents There could be cases where, in a single File there are several YAML documents, and we want to parse all of them. The Yaml class provides a loadAll () method to do such type of parsing. By default, the method returns an instance of Iterable<Object> where each object is of type Map<String, Object>.
You can extend the PyYAML loader and remove the implicit tagging of timestamps, or other types, as follows:
class NoDatesSafeLoader(yaml.SafeLoader):
@classmethod
def remove_implicit_resolver(cls, tag_to_remove):
"""
Remove implicit resolvers for a particular tag
Takes care not to modify resolvers in super classes.
We want to load datetimes as strings, not dates, because we
go on to serialise as json which doesn't have the advanced types
of yaml, and leads to incompatibilities down the track.
"""
if not 'yaml_implicit_resolvers' in cls.__dict__:
cls.yaml_implicit_resolvers = cls.yaml_implicit_resolvers.copy()
for first_letter, mappings in cls.yaml_implicit_resolvers.items():
cls.yaml_implicit_resolvers[first_letter] = [(tag, regexp)
for tag, regexp in mappings
if tag != tag_to_remove]
NoDatesSafeLoader.remove_implicit_resolver('tag:yaml.org,2002:timestamp')
Use this alternate loader as follows:
>>> yaml.load("2015-03-22 01:49:21", Loader=NoDatesSafeLoader)
'2015-03-22 01:49:21'
For reference, the original behavior would be:
>>> yaml.load("2015-03-22 01:49:21")
datetime.datetime(2015, 3, 22, 1, 49, 21)
Accepted answer's method is great for a pyyaml based library. In fact, it should be part of pyyaml's BaseResolver
class itself. But, for faster and kludgier in-place removal of a particular resolver:
yaml.SafeLoader.yaml_implicit_resolvers = {
k: [r for r in v if r[0] != 'tag:yaml.org,2002:timestamp'] for
k, v in yaml.SafeLoader.yaml_implicit_resolvers.items()
}
And then:
>>> yaml.load("2015-03-22 01:49:21", Loader=yaml.SafeLoader)
'2015-03-22 01:49:21'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With