Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to nest records in an Avro schema?

Tags:

python

avro

I'm trying to get Python to parse Avro schemas such as the following...

from avro import schema

mySchema = """
{
    "name": "person",
    "type": "record",
    "fields": [
        {"name": "firstname", "type": "string"},
        {"name": "lastname", "type": "string"},
        {
            "name": "address",
            "type": "record",
            "fields": [
                {"name": "streetaddress", "type": "string"},
                {"name": "city", "type": "string"}
            ]
        }
    ]
}"""

parsedSchema = schema.parse(mySchema)

...and I get the following exception:

avro.schema.SchemaParseException: Type property "record" not a valid Avro schema: Could not make an Avro Schema object from record.

What am I doing wrong?

like image 263
Jorge Aranda Avatar asked Aug 01 '12 17:08

Jorge Aranda


People also ask

How do you write Avro schema?

Creating Avro Schemastype − This field comes under the document as well as the under the field named fields. In case of document, it shows the type of the document, generally a record because there are multiple fields. When it is field, the type describes data type.

How does Avro schema work?

Avro has a schema-based system. A language-independent schema is associated with its read and write operations. Avro serializes the data which has a built-in schema. Avro serializes the data into a compact binary format, which can be deserialized by any application.

Does Avro have schema?

Avro is used to define the data schema for a record's value. This schema describes the fields allowed in the value, along with their data types. You apply a schema to the value portion of an Oracle NoSQL Database record using Avro bindings.

Does order of fields matter in Avro schema?

Avro serializer/deserializers operate on fields in the order they are declared. Producers and Consumers must be on a compatible schema including the field order. Do not change the order of AVRO fields. All Producers and Consumers are must be updated at the same time if you change the field order.


2 Answers

According to other sources on the web I would rewrite your second address definition:

mySchema = """
{
    "name": "person",
    "type": "record",
    "fields": [
        {"name": "firstname", "type": "string"},
        {"name": "lastname", "type": "string"},
        {
            "name": "address",
            "type": {
                        "type" : "record",
                        "name" : "AddressUSRecord",
                        "fields" : [
                            {"name": "streetaddress", "type": "string"},
                            {"name": "city", "type": "string"}
                        ]
                    }
        }
    ]
}"""
like image 182
Marco de Wit Avatar answered Oct 19 '22 05:10

Marco de Wit


Every time we provide the type as named type, the field needs to be given as:

"name":"some_name",
"type": {
          "name":"CodeClassName",
           "type":"record/enum/array"
 } 

However, if the named type is union, then we do not need an extra type field and should be usable as:

"name":"some_name",
"type": [{
          "name":"CodeClassName1",
           "type":"record",
           "fields": ...
          },
          {
           "name":"CodeClassName2",
            "type":"record",
            "fields": ...
}]

Hope this clarifies further!

like image 23
Ani Avatar answered Oct 19 '22 05:10

Ani