Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avro binary encode my json string to a byte array?

I have a actual JSON String which I need to avro binary encode to a byte array. After going through the Apache Avro specification, I came up with the below code.

I am not sure whether this is the right way to do it or not. Can anyone take a look whether the way I am trying to avro binary encode my JSON String is correct or not?. I am using Apache Avro 1.7.7 version.

public class AvroTest {

    private static final String json = "{" + "\"name\":\"Frank\"," + "\"age\":47" + "}";
    private static final String schema = "{ \"type\":\"record\", \"namespace\":\"foo\", \"name\":\"Person\", \"fields\":[ { \"name\":\"name\", \"type\":\"string\" }, { \"name\":\"age\", \"type\":\"int\" } ] }";

    public static void main(String[] args) throws IOException {
        byte[] data = jsonToAvro(json, schema);

        String jsonString = avroToJson(data, schema);
        System.out.println(jsonString);
    }

    /**
     * Convert JSON to avro binary array.
     * 
     * @param json
     * @param schemaStr
     * @return
     * @throws IOException
     */
    public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
        InputStream input = null;
        GenericDatumWriter<Object> writer = null;
        Encoder encoder = null;
        ByteArrayOutputStream output = null;
        try {
            Schema schema = new Schema.Parser().parse(schemaStr);
            DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
            input = new ByteArrayInputStream(json.getBytes());
            output = new ByteArrayOutputStream();
            DataInputStream din = new DataInputStream(input);
            writer = new GenericDatumWriter<Object>(schema);
            Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
            encoder = EncoderFactory.get().binaryEncoder(output, null);
            Object datum;
            while (true) {
                try {
                    datum = reader.read(null, decoder);
                } catch (EOFException eofe) {
                    break;
                }
                writer.write(datum, encoder);
            }
            encoder.flush();
            return output.toByteArray();
        } finally {
            try {
                input.close();
            } catch (Exception e) {
            }
        }
    }

    /**
     * Convert Avro binary byte array back to JSON String.
     * 
     * @param avro
     * @param schemaStr
     * @return
     * @throws IOException
     */
    public static String avroToJson(byte[] avro, String schemaStr) throws IOException {
        boolean pretty = false;
        GenericDatumReader<Object> reader = null;
        JsonEncoder encoder = null;
        ByteArrayOutputStream output = null;
        try {
            Schema schema = new Schema.Parser().parse(schemaStr);
            reader = new GenericDatumReader<Object>(schema);
            InputStream input = new ByteArrayInputStream(avro);
            output = new ByteArrayOutputStream();
            DatumWriter<Object> writer = new GenericDatumWriter<Object>(schema);
            encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
            Decoder decoder = DecoderFactory.get().binaryDecoder(input, null);
            Object datum;
            while (true) {
                try {
                    datum = reader.read(null, decoder);
                } catch (EOFException eofe) {
                    break;
                }
                writer.write(datum, encoder);
            }
            encoder.flush();
            output.flush();
            return new String(output.toByteArray());
        } finally {

        }
    }

}
like image 249
john Avatar asked Dec 30 '14 22:12

john


People also ask

What is JSON encoded Avro?

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.

Is Avro a binary format?

Avro is a data serialization system. Avro provides: Rich data structures. A compact, fast, binary data format.

What is the difference between Avro and JSON?

Avro has a JSON like data model, but can be represented as either JSON or in a compact binary form. It comes with a very sophisticated schema description language that describes data. We think Avro is the best choice for a number of reasons: It has a direct mapping to and from JSON.


1 Answers

It seems to be working at least. It can be simplified though: the loops are useless since having more than one object would result in invalid JSON. Also, it could be a good idea to avoid unnecessary parsing of the schema by preparsing it.

Here's my version:

public class AvroTest {

    private static final String JSON = "{" + "\"name\":\"Frank\"," + "\"age\":47" + "}";
    private static final Schema SCHEMA = new Schema.Parser().parse("{ \"type\":\"record\", \"namespace\":\"foo\", \"name\":\"Person\", \"fields\":[ { \"name\":\"name\", \"type\":\"string\" }, { \"name\":\"age\", \"type\":\"int\" } ] }");

    public static void main(String[] args) throws IOException {
        byte[] data = jsonToAvro(JSON, SCHEMA);

        String jsonString = avroToJson(data, SCHEMA);
        System.out.println(jsonString);
    }

    /**
     * Convert JSON to avro binary array.
     * 
     * @param json
     * @param schema
     * @return
     * @throws IOException
     */
    public static byte[] jsonToAvro(String json, Schema schema) throws IOException {
        DatumReader<Object> reader = new GenericDatumReader<>(schema);
        GenericDatumWriter<Object> writer = new GenericDatumWriter<>(schema);
        ByteArrayOutputStream output = new ByteArrayOutputStream();
        Decoder decoder = DecoderFactory.get().jsonDecoder(schema, json);
        Encoder encoder = EncoderFactory.get().binaryEncoder(output, null);
        Object datum = reader.read(null, decoder);
        writer.write(datum, encoder);
        encoder.flush();
        return output.toByteArray();
    }

    /**
     * Convert Avro binary byte array back to JSON String.
     * 
     * @param avro
     * @param schema
     * @return
     * @throws IOException
     */
    public static String avroToJson(byte[] avro, Schema schema) throws IOException {
        boolean pretty = false;
        GenericDatumReader<Object> reader = new GenericDatumReader<>(schema);
        DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
        ByteArrayOutputStream output = new ByteArrayOutputStream();
        JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
        Decoder decoder = DecoderFactory.get().binaryDecoder(avro, null);
        Object datum = reader.read(null, decoder);
        writer.write(datum, encoder);
        encoder.flush();
        output.flush();
        return new String(output.toByteArray(), "UTF-8");
    }
}
like image 58
Maurice Perry Avatar answered Oct 19 '22 08:10

Maurice Perry