Reading https://avro.apache.org/docs/current/spec.html it says a schema must be one of:
{"type": "typeName" ...attributes...}
where typeName
is either a
primitive or derived type name, as defined below. Attributes not
defined in this document are permitted as metadata, but must not
affect the format of serialized data.I want a schema that describes a tree, using the recursive definition that a tree is either:
My initial attempt looked like:
{
"name": "Tree",
"type": [
{
"name": "Node",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "Tree" }
}
]
},
{
"name": "Leaf",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
}
]
}
]
}
But the Avro compiler rejects this, complaining there is nothing of type {"name":"Tree","type":[{"name":"Node"...
. It seems Avro doesn't like the union type at the top-level. I'm guessing this falls under the aforementioned rule "a schema must be one of .. a JSON object .. where typeName is either a primitive or derived type name." I am not sure what a "derived type name" is though. At first I thought it was the same as a "complex type" but that includes union types..
Anyways, changing it to the more convoluted definition:
{
"name": "Tree",
"type": "record",
"fields": [{
"name": "ctors",
"type": [
{
"name": "Node",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "Tree" }
}
]
},
{
"name": "Leaf",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
}
]
}
]
}]
}
works, but now I have this weird record with just a single field whose sole purpose is to let me define the top-level union type I want.
Is this the only way to get what I want in Avro or is there a better way?
Thanks!
If you represent a Tree
as a node, and a Leaf
as a node with an empty list of children, you can avoid the named union problem completely, and do this quite simply with one recursive type:
{
"type": "record",
"name": "TreeNode",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "TreeNode" }
}
]
}
Now, your three types Tree
, Node
, and Leaf
are unified into one type TreeNode
, and there is no union of Node
and Leaf
necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With