Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AVRO Validation

Am fairly new to AVRO so please excuse if am missing anything obvious. Is there an AVRO validator/commandline utility that validates input against an AVRO schema ? Or probably points to where the error is in the json input.

like image 352
airboss Avatar asked Apr 16 '12 23:04

airboss


People also ask

How do I validate an Avro file?

avro-tools is external tool that can be used to convert Avro files to JSON/Text or vice-versa. Once data is imported we can copy the files from HDFS to local file system. We can run avro-tools tojson command to convert Avro file into JSON.

How do I validate JSON against Avro schema?

First you generate Java classes of your AVRO schema using the Apache AVRO Maven plugin (which is configured differently than documented). Next you serialize a JSON object using libraries from the Jackson project and the generated classes. During serialization, you will get clear exceptions.

What is the difference between JSON and Avro?

It is based on a subset of the JavaScript Programming Language. Avro can be classified as a tool in the "Serialization Frameworks" category, while JSON is grouped under "Languages". Redsift, OTTLabs, and Mon Style are some of the popular companies that use JSON, whereas Avro is used by Liferay, LendUp, and BetterCloud.

What is Avro format example?

Avro creates binary structured format that is both compressible and splittable. Hence it can be efficiently used as the input to Hadoop MapReduce jobs. Avro provides rich data structures. For example, you can create a record that contains an array, an enumerated type, and a sub record.


1 Answers

Not that I'm aware of. I wrote this little python script that will tell you if a json file matches a schema, but it won't tell you where the error is if there is one.

It depends on the Python avro library.

#!/usr/bin/env python

from avro.io import validate
from avro.schema import parse
from json import loads
from sys import argv

def main(argv):
    valid = set()
    invalid_avro = set()
    invalid_json = set()

    if len(argv) < 3:
        print "Give me an avro schema file and a whitespace-separated list of json files to validate against it."
    else:
        schema = parse(open(argv[1]).read())
        for arg in argv[2:]:
            try:
                json = loads(open(arg, 'r').read())
                if validate(schema, json):
                    valid.add(arg)
                else:
                    invalid_avro.add(arg)
            except ValueError:
                invalid_json.add(arg)
    print ' Valid files:\n\t' + '\n\t'.join(valid)
    print 'Invalid avro:\n\t' + '\n\t'.join(invalid_avro)
    print 'Invalid json:\n\t' + '\n\t'.join(invalid_json)

if '__main__' == __name__:
    main(argv)
like image 70
kojiro Avatar answered Oct 02 '22 01:10

kojiro