Reading https://avro.apache.org/docs/current/spec.html it says a schema must be one of:
{"type": "typeName" ...attributes...} where typeName is either a
primitive or derived type name, as defined below. Attributes not
defined in this document are permitted as metadata, but must not
affect the format of serialized data.I want a schema that describes a tree, using the recursive definition that a tree is either:
My initial attempt looked like:
{
  "name": "Tree",
  "type": [
    {
      "name": "Node",
      "type": "record",
      "fields": [
        {
          "name": "value",
          "type": "long"
        },
        {
          "name": "children",
          "type": { "type": "array", "items": "Tree" }
        }
      ]
    },
    {
      "name": "Leaf",
      "type": "record",
      "fields": [
        {
          "name": "value",
          "type": "long"
        }
      ]
    }
  ]
}
But the Avro compiler rejects this, complaining there is nothing of type {"name":"Tree","type":[{"name":"Node".... It seems Avro doesn't like the union type at the top-level. I'm guessing this falls under the aforementioned rule "a schema must be one of .. a JSON object .. where typeName is either a primitive or derived type name." I am not sure what a "derived type name" is though. At first I thought it was the same as a "complex type" but that includes union types..
Anyways, changing it to the more convoluted definition:
{
  "name": "Tree",
  "type": "record",
  "fields": [{
    "name": "ctors",
    "type": [
      {
        "name": "Node",
        "type": "record",
        "fields": [
          {
            "name": "value",
            "type": "long"
          },
          {
            "name": "children",
            "type": { "type": "array", "items": "Tree" }
          }
        ]
      },
      {
        "name": "Leaf",
        "type": "record",
        "fields": [
          {
            "name": "value",
            "type": "long"
          }
        ]
      }
    ]
  }]
}
works, but now I have this weird record with just a single field whose sole purpose is to let me define the top-level union type I want.
Is this the only way to get what I want in Avro or is there a better way?
Thanks!
If you represent a Tree as a node, and a Leaf as a node with an empty list of children, you can avoid the named union problem completely, and do this quite simply with one recursive type:
{
  "type": "record",
  "name": "TreeNode",
  "fields": [
    {
      "name": "value",
      "type": "long"
    },
    {
      "name": "children",
      "type": { "type": "array", "items": "TreeNode" }
    }
  ]
}
Now, your three types Tree, Node, and Leaf are unified into one type TreeNode, and there is no union of Node and Leaf necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With