Am fairly new to AVRO so please excuse if am missing anything obvious. Is there an AVRO validator/commandline utility that validates input against an AVRO schema ? Or probably points to where the error is in the json input.
avro-tools is external tool that can be used to convert Avro files to JSON/Text or vice-versa. Once data is imported we can copy the files from HDFS to local file system. We can run avro-tools tojson command to convert Avro file into JSON.
First you generate Java classes of your AVRO schema using the Apache AVRO Maven plugin (which is configured differently than documented). Next you serialize a JSON object using libraries from the Jackson project and the generated classes. During serialization, you will get clear exceptions.
It is based on a subset of the JavaScript Programming Language. Avro can be classified as a tool in the "Serialization Frameworks" category, while JSON is grouped under "Languages". Redsift, OTTLabs, and Mon Style are some of the popular companies that use JSON, whereas Avro is used by Liferay, LendUp, and BetterCloud.
Avro creates binary structured format that is both compressible and splittable. Hence it can be efficiently used as the input to Hadoop MapReduce jobs. Avro provides rich data structures. For example, you can create a record that contains an array, an enumerated type, and a sub record.
Not that I'm aware of. I wrote this little python script that will tell you if a json file matches a schema, but it won't tell you where the error is if there is one.
It depends on the Python avro library.
#!/usr/bin/env python
from avro.io import validate
from avro.schema import parse
from json import loads
from sys import argv
def main(argv):
valid = set()
invalid_avro = set()
invalid_json = set()
if len(argv) < 3:
print "Give me an avro schema file and a whitespace-separated list of json files to validate against it."
else:
schema = parse(open(argv[1]).read())
for arg in argv[2:]:
try:
json = loads(open(arg, 'r').read())
if validate(schema, json):
valid.add(arg)
else:
invalid_avro.add(arg)
except ValueError:
invalid_json.add(arg)
print ' Valid files:\n\t' + '\n\t'.join(valid)
print 'Invalid avro:\n\t' + '\n\t'.join(invalid_avro)
print 'Invalid json:\n\t' + '\n\t'.join(invalid_json)
if '__main__' == __name__:
main(argv)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With