optional array in avro schema

Tags:

I'm wondering whether or not it is possible to have an optional array. Let's assume a schema like this:

{ 
    "type": "record",
    "name": "test_avro",
    "fields" : [
        {"name": "test_field_1", "type": "long"},
        {"name": "subrecord", "type": [{
         "type": "record",
         "name": "subrecord_type",
           "fields":[{"name":"field_1", "type":"long"}]
          },"null"]
    },
    {"name": "simple_array",
    "type":{
        "type": "array",
        "items": "string"
      }
    }
  ]
}

Trying to write an avro record without "simple_array" would result in a NPE in the datafilewriter. For subrecord it's just fine, but when I try to define the array as optional:

{"name": "simple_array",
 "type":[{
   "type": "array",
   "items": "string"
   }, "null"]

It does not result in a NPE but a runtime exception:

AvroRuntimeException: Not an array schema: [{"type":"array","items":"string"},"null"]

Thanks.

321

asked Feb 23 '12 17:02

Philipp Pahl

1 Answers

I think what you want here is a union of null and array:

{
    "type":"record",
    "name":"test_avro",
    "fields":[{
            "name":"test_field_1",
            "type":"long"
        },
        {
            "name":"subrecord",
            "type":[{
                    "type":"record",
                    "name":"subrecord_type",
                    "fields":[{
                            "name":"field_1",
                            "type":"long"
                        }
                    ]
                },
                "null"
            ]
        },
        {
            "name":"simple_array",
            "type":["null",
                {
                    "type":"array",
                    "items":"string"
                }
            ],
            "default":null
        }
    ]
}

When I use the above schema with sample data in Python, here's the result (schema_string is the above json string):

>>> from avro import io, datafile, schema
>>> from json import dumps
>>> 
>>> sample_data = {'test_field_1':12L}
>>> rec_schema = schema.parse(schema_string)
>>> rec_writer = io.DatumWriter(rec_schema)
>>> rec_reader = io.DatumReader()
>>> 
>>> # write avro file
... df_writer = datafile.DataFileWriter(open("/tmp/foo", 'wb'), rec_writer, writers_schema=rec_schema)
>>> df_writer.append(sample_data)
>>> df_writer.close()
>>> 
>>> # read avro file
... df_reader = datafile.DataFileReader(open('/tmp/foo', 'rb'), rec_reader)
>>> print dumps(df_reader.next())
{"simple_array": null, "test_field_1": 12, "subrecord": null}

186

answered Oct 01 '22 08:10

kojiro

Related questions
                            
                                javascript find child object in nested arrays
                            
                                Ruby select by index
                            
                                Swift Array extension for standard deviation
                            
                                Merging many arrays from Promise.all
                            
                                Get median of array
                            
                                How to flatten nested array of object using es6
                            
                                How can I JSON encode an array in VB.NET?
                            
                                How does LINQ .distinct method sort?
                            
                                PHP: many concats or one implode?
                            
                                Array size metafunction - is it in boost somewhere?
                            
                                PHP Array - Turning Array values into Keys
                            
                                Using Powershell, how can i count the occurrence of each element in an array?
                            
                                Making an Array to Hold Arrays of Character Arrays in C
                            
                                Find index of object in array with highest value in property
                            
                                Using cout to print the entire contents of a character array
                            
                                Does "int size = 10;" yield a constant expression?
                            
                                ReDim Preserve "Subscript Out of Range"
                            
                                How do I access the not-the-first elements of an array in Swift?
                            
                                Python, numpy; How to best deal with possible 0d arrays
                            
                                String-array in Kotlin

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

optional array in avro schema

Tags:

arrays

null

optional

avro

Philipp Pahl

People also ask

1 Answers

kojiro

Recent Activity

Donate For Us