Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

optional array in avro schema

I'm wondering whether or not it is possible to have an optional array. Let's assume a schema like this:

{ 
    "type": "record",
    "name": "test_avro",
    "fields" : [
        {"name": "test_field_1", "type": "long"},
        {"name": "subrecord", "type": [{
         "type": "record",
         "name": "subrecord_type",
           "fields":[{"name":"field_1", "type":"long"}]
          },"null"]
    },
    {"name": "simple_array",
    "type":{
        "type": "array",
        "items": "string"
      }
    }
  ]
}

Trying to write an avro record without "simple_array" would result in a NPE in the datafilewriter. For subrecord it's just fine, but when I try to define the array as optional:

{"name": "simple_array",
 "type":[{
   "type": "array",
   "items": "string"
   }, "null"]

It does not result in a NPE but a runtime exception:

AvroRuntimeException: Not an array schema: [{"type":"array","items":"string"},"null"]

Thanks.

like image 321
Philipp Pahl Avatar asked Feb 23 '12 17:02

Philipp Pahl


People also ask

What is an Avro schema enum?

This is a JSON array of strings, which describes the alternate names for this record (optional). It is a JSON array, listing fields (required). ii. Avro Schema Enums It uses the type name “enum” also do supports various attributes: It is a JSON string which provides the name of the enum (required).

What is the difference between array and map in Avro?

It uses the type name “array” and supports only one attribute: It is simply the schema of the array’s items. iv. Avro Schema Maps It uses the type name “map” and does support only one attribute: It is the schema of the map’s values.

What are the primitive data types of Avro schema?

Avro schema is having primitive data types as well as complex data types. The following table describes the primitive data types of Avro − Complex Data Types of Avro Along with primitive data types, Avro provides six complex data types namely Records, Enums, Arrays, Maps, Unions, and Fixed.

How to create Avro schema using JSON?

In a lightweight text-based data interchange format, JavaScript Object Notation (JSON), the Avro schema is created. It is possible to create Avro Schema using JSON in one of the several ways − a. Avro Schema Example Now, within “DataFlair” namespace, the given schema defines a (record type) document.


1 Answers

I think what you want here is a union of null and array:

{
    "type":"record",
    "name":"test_avro",
    "fields":[{
            "name":"test_field_1",
            "type":"long"
        },
        {
            "name":"subrecord",
            "type":[{
                    "type":"record",
                    "name":"subrecord_type",
                    "fields":[{
                            "name":"field_1",
                            "type":"long"
                        }
                    ]
                },
                "null"
            ]
        },
        {
            "name":"simple_array",
            "type":["null",
                {
                    "type":"array",
                    "items":"string"
                }
            ],
            "default":null
        }
    ]
}

When I use the above schema with sample data in Python, here's the result (schema_string is the above json string):

>>> from avro import io, datafile, schema
>>> from json import dumps
>>> 
>>> sample_data = {'test_field_1':12L}
>>> rec_schema = schema.parse(schema_string)
>>> rec_writer = io.DatumWriter(rec_schema)
>>> rec_reader = io.DatumReader()
>>> 
>>> # write avro file
... df_writer = datafile.DataFileWriter(open("/tmp/foo", 'wb'), rec_writer, writers_schema=rec_schema)
>>> df_writer.append(sample_data)
>>> df_writer.close()
>>> 
>>> # read avro file
... df_reader = datafile.DataFileReader(open('/tmp/foo', 'rb'), rec_reader)
>>> print dumps(df_reader.next())
{"simple_array": null, "test_field_1": 12, "subrecord": null}
like image 186
kojiro Avatar answered Oct 01 '22 08:10

kojiro