Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get typed value from GenericRecord?

Tags:

java

avro

I am working with Avro and I have a GenericRecord. I want to extract clientId, deviceName, holder from it. In the Avro Schema, clientId is Integer, deviceName is String and holder is a Map.

clientId in the avro schema:

{
    "name" : "clientId",
    "type" : [ "null", "int" ],
    "doc" : "hello"
}

deviceName in the avro schema:

{
    "name" : "deviceName",
    "type" : [ "null", "string" ],
    "doc" : "test"
}

holder in the avro schema:

{
    "name" : "holder",
    "type" : {
      "type" : "map",
      "values" : "string"
    }
}

My question is - what is the recommended way to retrieve a typed value, as opposed to an Object?

In the below code, payload is GenericRecord and we can get avro schema from it. This is what I am doing right now, extracting everything as a String. But how can I just get typed value instead. Is there any way? I mean whatever the data type is there in the avro schema, I want to extract that only.

  public static void getData(GenericRecord payload) {
    String id = String.valueOf(payload.get("clientId"));
    String name = String.valueOf(payload.get("deviceName"));

    // not sure how to get maps here
  }

So I want to extract clientId as Integer, deviceName as String and holder as Java map Map<String, String> from GenericRecord? What is the best way to do that? Can we write any utility which does all the typed conversions given generic record and schema?

like image 330
john Avatar asked Nov 24 '16 18:11

john


2 Answers

You should be able to cast your string values to Utf8, int to Integer, and map to Map<Utf8, Utf8>. This should work without causing a ClassCastException:

public static void getData(GenericRecord payload) {
    int id = (Integer) payload.get("clientId");
    String name = payload.get("deviceName").toString(); // calls Utf8.toString
    Map<Utf8, Utf8> holder = (Map<Utf8, Utf8>) payload.get("holder");

    ...
}

In general, I believe you can do these casts:

  • primitives become their boxed version (Integer, Double, etc.)
  • string becomes Utf8
  • bytes becomes java.nio.ByteBuffer
  • array becomes java.util.Collection
  • map becomes java.util.Map<Utf8, [value type]>
like image 115
qxz Avatar answered Oct 20 '22 00:10

qxz


You may try this approach. For robust implementation you should consider the code generation using schema compilation.

package stackoverflow;

import static org.hamcrest.CoreMatchers.is;
import static org.junit.Assert.assertThat;

import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;

import org.apache.avro.AvroTypeException;
import org.apache.avro.Schema;
import org.apache.avro.Schema.Field;
import org.apache.avro.Schema.Type;
import org.apache.avro.generic.GenericData.Record;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.util.Utf8;
import org.junit.Test;

// Just for demonistration; not robust implementation
public class GenericRecordType {
    @Test
    public void testName() throws Exception {
        Schema schema = buildSchema();

        GenericRecord record = new Record(schema);
        record.put("clientId", 12);
        record.put("deviceName", "GlassScanner");
        record.put("holder", new HashMap<>());

        Integer value = IntField.clientId.getValue(record);
        String deviceName = StringField.deviceName.getValue(record);
        Map<String, String> mapString = MapOfStringField.holder.getValue(record);

        assertThat(deviceName, is("GlassScanner"));
        assertThat(value, is(12));
        assertThat(mapString.size(), is(0));
    }

    private Schema buildSchema() {
        Field clientId = new Field("clientId", Schema.create(Type.INT), "hello", (Object) null);
        Field deviceName = new Field("deviceName", Schema.create(Type.STRING), "hello", (Object) null);
        Field holder = new Field("holder", Schema.createMap(Schema.create(Type.STRING)), null, (Object) null);
        Schema schema = Schema.createRecord(Arrays.asList(clientId, deviceName, holder));
        return schema;
    }

    public static interface TypedField<T> {
        String name();

        public T getValue(GenericRecord record);

    }

    public static enum StringField implements TypedField<String> {
        deviceName;

        @Override
        public String getValue(GenericRecord record) {
            String typed = null;
            Object raw = record.get(name());
            if (raw != null) {
                if (!(raw instanceof String || raw instanceof Utf8)) {
                    throw new AvroTypeException("string type was epected for field:" + name());
                }
                typed = raw.toString();
            }
            return typed;
        }

    }

    public static enum IntField implements TypedField<Integer> {
        clientId;

        private IntField() {
        }

        @Override
        public Integer getValue(GenericRecord record) {
            Integer typed = null;
            Object raw = record.get(name());
            if (raw != null) {
                if (!(raw instanceof Integer)) {
                    throw new AvroTypeException("int type was epected for field:" + name());
                }
                typed = (Integer) raw;
            }
            return typed;
        }

    }

    public static enum MapOfStringField implements TypedField<Map<String, String>> {
        holder;

        @Override
        @SuppressWarnings("unchecked")
        public Map<String, String> getValue(GenericRecord record) {
            Map<String, String> typed = null;
            Object raw = record.get(name());
            if (raw != null) {
                if (!(raw instanceof Map)) {
                    throw new AvroTypeException("map type was epected for field:" + name());
                }
                typed = (Map<String, String>) raw;
            }
            return typed;
        }
    }

}
like image 38
skadya Avatar answered Oct 20 '22 00:10

skadya