Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deserialize a YAML "Table" of data

I am using yamldotnet and c# to deserialize a file created by a third party software application. The following YAML file examples are both valid from the application:

#File1
Groups:
  - Name: ATeam
    FirstName, LastName, Age, Height:
      - [Joe, Soap, 21, 184]
      - [Mary, Ryan, 20, 169]
      - [Alex, Dole, 24, 174]

#File2
Groups:
  - Name: ATeam
    FirstName, LastName, Height:
      - [Joe, Soap, 184]
      - [Mary, Ryan, 169]
      - [Alex, Dole, 174]

Notice that File2 doesnt have any Age column but the deserializer must still recognise that the third value on each line is a height rather than an age. This data is supposed to represent a table of people. In the case of File1 for example, Mary Ryan is age 20 and is 169cm tall. The deserializer needs to understand the columns it has (for File2 it only has FirstName, LastName and Height) and store the data accordingly in the right objects : Mary Ryan is 169cm tall.

Similarly the program documentation states that the order of the columns is not important so File3 below is an equally valid way to represent the data in File2 even though Height is now first:

#File3
Groups:
 - Name: ATeam
   Height, FirstName, LastName:
      - [184, Joe, Soap]
      - [169, Mary, Ryan]
      - [174, Alex, Dole]

I have a number of questions:

  1. Is this standard YAML? - I could not find anything about the use of a number of keys on the same line followed by a colon and lists of values to represent tables of data.
  2. How would I use yamldotnet to deserialize this? Are there modifications I can make to help it?
  3. If I can't use yamldotnet, how should I go about it?
like image 980
Barry Avatar asked Jun 17 '15 14:06

Barry


People also ask

What is serialization in YAML?

YAML's data serialization format includes an understated and unique advantage, its Anchors. YAML anchors and aliases let you reference and use the same data multiple times within a single YAML document. YAML's Anchors are a time-saving advantage for a developer writing lengthy pipelines.

Is YAML a data serialization language?

YAML is a digestible data serialization language often used to create configuration files with any programming language. Designed for human interaction, YAML is a strict superset of JSON, another data serialization language. But because it's a strict superset, it can do everything that JSON can and more.

Is YAML machine readable?

YAML (/ˈjæməl/ and YAH-ml) (see § History and name) is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted.


2 Answers

As other answers stated, this is valid YAML. However, the structure of the document is specific to the application, and does not use any special feature of YAML to express tables.

You can easily parse this document using YamlDotNet. However you will run into two difficulties. The first is that, since the names of the columns are placed inside the key, you will need to use some custom serialization code to handle them. The second is that you will need to implement some kind of abstraction to be able to access the data in a tabular way.

I have put-up a proof of concept that will illustrate how to parse and read the data.

First, create a type to hold the information from the YAML document:

public class Document
{
    public List<Group> Groups { get; set; }
}

public class Group
{
    public string Name { get; set; }

    public IEnumerable<string> ColumnNames { get; set; }

    public IList<IList<object>> Rows { get; set; }
}

Then implement IYamlTypeConverter to parse the Group type:

public class GroupYamlConverter : IYamlTypeConverter
{
    private readonly Deserializer deserializer;

    public GroupYamlConverter(Deserializer deserializer)
    {
        this.deserializer = deserializer;
    }

    public bool Accepts(Type type)
    {
        return type == typeof(Group);
    }

    public object ReadYaml(IParser parser, Type type)
    {
        var group = new Group();

        var reader = new EventReader(parser);
        do
        {
            var key = reader.Expect<Scalar>();
            if(key.Value == "Name")
            {
                group.Name = reader.Expect<Scalar>().Value;
            }
            else
            {
                group.ColumnNames = key.Value
                    .Split(',')
                    .Select(n => n.Trim())
                    .ToArray();

                group.Rows = deserializer.Deserialize<IList<IList<object>>>(reader);
            }
        } while(!reader.Accept<MappingEnd>());
        reader.Expect<MappingEnd>();

        return group;
    }

    public void WriteYaml(IEmitter emitter, object value, Type type)
    {
        throw new NotImplementedException("TODO");
    }
}

Last, register the converter into the deserializer and deserialize the document:

var deserializer = new Deserializer();
deserializer.RegisterTypeConverter(new GroupYamlConverter(deserializer));

var document = deserializer.Deserialize<Document>(new StringReader(yaml));

You can test the fully working example here

This is only a proof of concept, but it should serve as a guideline for you own implementation. Things that could be improved include:

  • Checking for and handling invalid documents.
  • Improving the Group class. Maybe make it immutable, and also add an indexer.
  • Implementing the WriteYaml method if serialization support is desired.
like image 85
Antoine Aubry Avatar answered Sep 29 '22 22:09

Antoine Aubry


All of these are valid YAML files. You are however mistaking interpreting a scalar key with commas as constituting a description in YAML of the "columns" in the sequences of the value associated with that key.

In File 1, FirstName, LastName, Age, Height is a single string scalar key for the mapping that is the first element of the sequence that is value for the key Group at the top level. Just like name is. You can, but don't have to in YAML, put quotes around the whole scalar.

The association you make between a string "Firstname" and "Joe" is not there in YAML, you can make that association in the program that interprets the key (by splitting it on ", ") as you seem to be doing, but YAML has no knowledge of that.

So if you want to be smart about this, then you need to split the string "FirstName, LastName, Age, Height" yourself and use some mechanism to then use the "subkeys" to index the sequences that are associated with the key.

If it helps to understand all this, the following is a json dump of the first files' contents, there you see clearly what the keys consist of:

{"Groups": [{"FirstName, LastName, Age, Height": [["Joe", "Soap", 21,
   184], ["Mary", "Ryan", 20, 169], ["Alex", "Dole", 24, 174]], 
   "Name": "ATeam"}]}

I used the Python based ruamel.yaml library for this (of which I am the author) but you could also use an online convertor/checker like http://yaml-online-parser.appspot.com/

like image 40
Anthon Avatar answered Sep 29 '22 23:09

Anthon