I'm wondering whether or not it is possible to have an optional array. Let's assume a schema like this:
{
"type": "record",
"name": "test_avro",
"fields" : [
{"name": "test_field_1", "type": "long"},
{"name": "subrecord", "type": [{
"type": "record",
"name": "subrecord_type",
"fields":[{"name":"field_1", "type":"long"}]
},"null"]
},
{"name": "simple_array",
"type":{
"type": "array",
"items": "string"
}
}
]
}
Trying to write an avro record without "simple_array" would result in a NPE in the datafilewriter. For subrecord it's just fine, but when I try to define the array as optional:
{"name": "simple_array",
"type":[{
"type": "array",
"items": "string"
}, "null"]
It does not result in a NPE but a runtime exception:
AvroRuntimeException: Not an array schema: [{"type":"array","items":"string"},"null"]
Thanks.
This is a JSON array of strings, which describes the alternate names for this record (optional). It is a JSON array, listing fields (required). ii. Avro Schema Enums It uses the type name “enum” also do supports various attributes: It is a JSON string which provides the name of the enum (required).
It uses the type name “array” and supports only one attribute: It is simply the schema of the array’s items. iv. Avro Schema Maps It uses the type name “map” and does support only one attribute: It is the schema of the map’s values.
Avro schema is having primitive data types as well as complex data types. The following table describes the primitive data types of Avro − Complex Data Types of Avro Along with primitive data types, Avro provides six complex data types namely Records, Enums, Arrays, Maps, Unions, and Fixed.
In a lightweight text-based data interchange format, JavaScript Object Notation (JSON), the Avro schema is created. It is possible to create Avro Schema using JSON in one of the several ways − a. Avro Schema Example Now, within “DataFlair” namespace, the given schema defines a (record type) document.
I think what you want here is a union of null and array:
{
"type":"record",
"name":"test_avro",
"fields":[{
"name":"test_field_1",
"type":"long"
},
{
"name":"subrecord",
"type":[{
"type":"record",
"name":"subrecord_type",
"fields":[{
"name":"field_1",
"type":"long"
}
]
},
"null"
]
},
{
"name":"simple_array",
"type":["null",
{
"type":"array",
"items":"string"
}
],
"default":null
}
]
}
When I use the above schema with sample data in Python, here's the result (schema_string
is the above json string):
>>> from avro import io, datafile, schema
>>> from json import dumps
>>>
>>> sample_data = {'test_field_1':12L}
>>> rec_schema = schema.parse(schema_string)
>>> rec_writer = io.DatumWriter(rec_schema)
>>> rec_reader = io.DatumReader()
>>>
>>> # write avro file
... df_writer = datafile.DataFileWriter(open("/tmp/foo", 'wb'), rec_writer, writers_schema=rec_schema)
>>> df_writer.append(sample_data)
>>> df_writer.close()
>>>
>>> # read avro file
... df_reader = datafile.DataFileReader(open('/tmp/foo', 'rb'), rec_reader)
>>> print dumps(df_reader.next())
{"simple_array": null, "test_field_1": 12, "subrecord": null}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With