I am using yamldotnet and c# to deserialize a file created by a third party software application. The following YAML file examples are both valid from the application:
#File1
Groups:
- Name: ATeam
FirstName, LastName, Age, Height:
- [Joe, Soap, 21, 184]
- [Mary, Ryan, 20, 169]
- [Alex, Dole, 24, 174]
#File2
Groups:
- Name: ATeam
FirstName, LastName, Height:
- [Joe, Soap, 184]
- [Mary, Ryan, 169]
- [Alex, Dole, 174]
Notice that File2 doesnt have any Age column but the deserializer must still recognise that the third value on each line is a height rather than an age. This data is supposed to represent a table of people. In the case of File1 for example, Mary Ryan is age 20 and is 169cm tall. The deserializer needs to understand the columns it has (for File2 it only has FirstName, LastName and Height) and store the data accordingly in the right objects : Mary Ryan is 169cm tall.
Similarly the program documentation states that the order of the columns is not important so File3 below is an equally valid way to represent the data in File2 even though Height is now first:
#File3
Groups:
- Name: ATeam
Height, FirstName, LastName:
- [184, Joe, Soap]
- [169, Mary, Ryan]
- [174, Alex, Dole]
I have a number of questions:
YAML's data serialization format includes an understated and unique advantage, its Anchors. YAML anchors and aliases let you reference and use the same data multiple times within a single YAML document. YAML's Anchors are a time-saving advantage for a developer writing lengthy pipelines.
YAML is a digestible data serialization language often used to create configuration files with any programming language. Designed for human interaction, YAML is a strict superset of JSON, another data serialization language. But because it's a strict superset, it can do everything that JSON can and more.
YAML (/ˈjæməl/ and YAH-ml) (see § History and name) is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted.
As other answers stated, this is valid YAML. However, the structure of the document is specific to the application, and does not use any special feature of YAML to express tables.
You can easily parse this document using YamlDotNet. However you will run into two difficulties. The first is that, since the names of the columns are placed inside the key, you will need to use some custom serialization code to handle them. The second is that you will need to implement some kind of abstraction to be able to access the data in a tabular way.
I have put-up a proof of concept that will illustrate how to parse and read the data.
First, create a type to hold the information from the YAML document:
public class Document
{
public List<Group> Groups { get; set; }
}
public class Group
{
public string Name { get; set; }
public IEnumerable<string> ColumnNames { get; set; }
public IList<IList<object>> Rows { get; set; }
}
Then implement IYamlTypeConverter
to parse the Group
type:
public class GroupYamlConverter : IYamlTypeConverter
{
private readonly Deserializer deserializer;
public GroupYamlConverter(Deserializer deserializer)
{
this.deserializer = deserializer;
}
public bool Accepts(Type type)
{
return type == typeof(Group);
}
public object ReadYaml(IParser parser, Type type)
{
var group = new Group();
var reader = new EventReader(parser);
do
{
var key = reader.Expect<Scalar>();
if(key.Value == "Name")
{
group.Name = reader.Expect<Scalar>().Value;
}
else
{
group.ColumnNames = key.Value
.Split(',')
.Select(n => n.Trim())
.ToArray();
group.Rows = deserializer.Deserialize<IList<IList<object>>>(reader);
}
} while(!reader.Accept<MappingEnd>());
reader.Expect<MappingEnd>();
return group;
}
public void WriteYaml(IEmitter emitter, object value, Type type)
{
throw new NotImplementedException("TODO");
}
}
Last, register the converter into the deserializer and deserialize the document:
var deserializer = new Deserializer();
deserializer.RegisterTypeConverter(new GroupYamlConverter(deserializer));
var document = deserializer.Deserialize<Document>(new StringReader(yaml));
You can test the fully working example here
This is only a proof of concept, but it should serve as a guideline for you own implementation. Things that could be improved include:
Group
class. Maybe make it immutable, and also add an indexer.WriteYaml
method if serialization support is desired.All of these are valid YAML files. You are however mistaking interpreting a scalar key with commas as constituting a description in YAML of the "columns" in the sequences of the value associated with that key.
In File 1, FirstName, LastName, Age, Height
is a single string scalar key for the mapping that is the first element of the sequence that is value for the key Group
at the top level. Just like name
is. You can, but don't have to in YAML, put quotes around the whole scalar.
The association you make between a string "Firstname" and "Joe" is not there in YAML, you can make that association in the program that interprets the key (by splitting it on ", "
) as you seem to be doing, but YAML has no knowledge of that.
So if you want to be smart about this, then you need to split the string "FirstName, LastName, Age, Height"
yourself and use some mechanism to then use the "subkeys" to index the sequences that are associated with the key.
If it helps to understand all this, the following is a json dump of the first files' contents, there you see clearly what the keys consist of:
{"Groups": [{"FirstName, LastName, Age, Height": [["Joe", "Soap", 21,
184], ["Mary", "Ryan", 20, 169], ["Alex", "Dole", 24, 174]],
"Name": "ATeam"}]}
I used the Python based ruamel.yaml
library for this (of which I am the author) but you could also use an online convertor/checker like http://yaml-online-parser.appspot.com/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With