Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

YAML vs Python configuration/parameter files (but perhaps also vs JSON vs XML)

I see Python used to do a fair amount of code generation for C/C++ header and source files. Usually, the input files which store parameters are in JSON or YAML format, although most of what I see is YAML. However, why not just use Python files directly? Why use YAML at all in this case?

That also got me thinking: since Python is a scripted language, its files, when containing only data and data structures, could literally be used the same as XML, JSON, YAML, etc. Do people do this? Is there a good use case for it?

What if I want to import a configuration file into a C or C++ program? What about into a Python program? In the Python case it seems to me there is no sense in using YAML at all, as you can just store your configuration parameters and variables in pure Python files. In the C or C++ case, it seems to me you could still store your data in Python files and then just have a Python script import that and auto-generate header and source files for you as part of the build process. Again, perhaps there's no need for YAML or JSON in this case at all either.

Thoughts?

Here's an example of storing some nested key/value hash table pairs in a YAML file:

my_params.yml:

---
dict_key1:
    dict_key2:
        dict_key3a: my string message
        dict_key3b: another string message

And the same exact thing in a pure Python file:

my_params.py

data = {
    "dict_key1": {
        "dict_key2": {
            "dict_key3a": "my string message",
            "dict_key3b": "another string message",
        }
    }
}

And to read in both the YAML and Python data and print it out:

import_config_file.py:

import yaml # Module for reading in YAML files
import json # Module for pretty-printing Python dictionary types
            # See: https://stackoverflow.com/a/34306670/4561887

# 1) import .yml file
with open("my_params.yml", "r") as f:
    data_yml = yaml.load(f)

# 2) import .py file
from my_params import data as data_py
# OR: Alternative method of doing the above:
# import my_params
# data_py = my_params.data

# 3) print them out
print("data_yml = ")
print(json.dumps(data_yml, indent=4))

print("\ndata_py = ")
print(json.dumps(data_py, indent=4))

Reference for using json.dumps: https://stackoverflow.com/a/34306670/4561887

SAMPLE OUTPUT of running python3 import_config_file.py:

data_yml = 
{
    "dict_key1": {
        "dict_key2": {
            "dict_key3a": "my string message",
            "dict_key3b": "another string message"
        }
    }
}

data_py = 
{
    "dict_key1": {
        "dict_key2": {
            "dict_key3a": "my string message",
            "dict_key3b": "another string message"
        }
    }
}
like image 792
Gabriel Staples Avatar asked Nov 06 '22 21:11

Gabriel Staples


1 Answers

Yes people do this, and have been doing this for years.

But many make the mistake you do and make it unsafe to by using import my_params.py. That would be the same as loading YAML using YAML(typ='unsafe') in ruamel.yaml (or yaml.load() in PyYAML, which is unsafe).

What you should do is using the ast package that comes with Python to parse your "data" structure, to make such an import safe. My package pon has code to update these kind of structures, and in each of my __init__.py files there is such an piece of data named _package_data that is read by some code (function literal_eval) in the setup.py for the package. The ast based code in setup.py takes around ~100 lines.

The advantage of doing this in a structured way are the same as with using YAML: you can programmatically update the data structure (version numbers!), although I consider PON, (Python Object Notation), less readable than YAML and slightly less easy to manually update.

like image 196
Anthon Avatar answered Nov 12 '22 20:11

Anthon