Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing Yaml in Python: Detect duplicated keys

The yaml library in python is not able to detect duplicated keys. This is a bug that has been reported years ago and there is not a fix yet.

I would like to find a decent workaround to this problem. How plausible could be to create a regex that returns all the keys ? Then it would be quite easy to detect this problem.

Could any regex master suggest a regex that is able to extract all the keys to find duplicates ?

File example:

mykey1:
    subkey1: value1
    subkey2: value2
    subkey3:
      - value 3.1
      - value 3.2
mykey2:
    subkey1: this is not duplicated
    subkey5: value5
    subkey5: duplicated!
    subkey6:
       subkey6.1: value6.1
       subkey6.2: valye6.2
like image 982
Tk421 Avatar asked Nov 03 '15 03:11

Tk421


People also ask

Does Yaml allow duplicate keys?

Duplicate keys in YAML files are not allowed in the spec (https://yaml.org/spec/1.2.2/#nodes, https://yaml.org/spec/1.0/#model-node), but the older version of symfony/yaml does not complain about them. The newer version throws an exception.

What is Yaml Safe_load?

Loading a YAML Document Safely Using safe_load() safe_load(stream) Parses the given and returns a Python object constructed from the first document in the stream. safe_load recognizes only standard YAML tags and cannot construct an arbitrary Python object.


1 Answers

The yamllint command-line tool does what you want:

sudo pip install yamllint

Specifically, it has a rule key-duplicates that detects repetitions and keys over-writing one another:

$ yamllint test.yaml
test.yaml
  1:1       warning  missing document start "---"  (document-start)
  10:5      error    duplication of key "subkey5" in mapping  (key-duplicates)

(It has many other rules that you can enable/disable or tweak.)

like image 58
Adrien Vergé Avatar answered Sep 22 '22 23:09

Adrien Vergé