Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyYAML yaml.dump() produces complex key for string key > 122 chars?

Using PyYAML 3.11 with Python 2.7.6, let's dump a simple dictionary that has just a single string key (of length 122 chars), mapping to the value '1':

>>> print yaml.dump({'12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012': 1})
{'12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012': 1}

That works just as expected - simple, human-readable YAML. But now let's increase the length of that string key to 123 chars. Now PyYAML creates a less-human-readable complex key, starting with "?", and it shunts the value "1" onto a new line:

>>> print yaml.dump({'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123': 1})
{? '123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123'
  : 1}

Why does PyYAML do this? Is there any way to disable the behavior? It is leading to an undesirable lack of visual consistency in my dumped YAML code, depending on the length of the string key.

like image 932
Jonathan Rice Avatar asked Jul 03 '15 01:07

Jonathan Rice


1 Answers

The reason that you get the explicit key marker ? is that you get over the length limit of a simple key. This is compared in a function in the emitter with 128 ( the length of the implicit tag !!str pushes it over that threshold). You can rewrite the complete function that checks for the key to be simple, but there is no simple way to do so as the value is hard coded within the function.

I have never been able to find a reason for this particular threshold in the YAML spec. Nor in the PyYAML source, as on the parser side it is able to handle such long keys (with or without ?).

In ruamel.yaml ¹, you can change the threshold by changing the dumper instance:

from __future__ import print_function

import sys
import ruamel.yaml as yaml

yaml_str = """\
- {'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123': 1}
"""

data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
dumper = yaml.RoundTripDumper
print('MAX_SIMPLE_KEY_LENGTH', dumper.MAX_SIMPLE_KEY_LENGTH)

yaml.dump(data, sys.stdout, Dumper=dumper)
dumper.MAX_SIMPLE_KEY_LENGTH = 256
print('After raising the threshold:')
yaml.dump(data, sys.stdout, Dumper=dumper)

will give you:

MAX_SIMPLE_KEY_LENGTH 128
- {? '123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123'
  : 1}
After raising the threshold:
- {'123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123': 1}

As I normally work on terminal windows of 80 columns I still find keys that long difficult to read, of course YMMV. Especially when round-tripping YAML one needs to have fine-control over when your keys get changed this way.

¹ Disclaimer: I am the author of that enhanced version of PyYAML.

like image 63
Anthon Avatar answered Oct 02 '22 12:10

Anthon