Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avro schema definition nesting types

Tags:

avro

I am fairly new to Avro and going through documentation for nested types. I have the example below working nicely but many different types within the model will have addresses. Is it possible to define an address.avsc file and reference that as a nested type? If that is possible, can you also take it a step further and have a list of Addresses for a Customer? Thanks in advance.

{"namespace": "com.company.model",
  "type": "record",
  "name": "Customer",
  "fields": [
    {"name": "firstname", "type": "string"},
    {"name": "lastname", "type": "string"},
    {"name": "email", "type": "string"},
    {"name": "phone", "type": "string"},
    {"name": "address", "type":
      {"type": "record",
       "name": "AddressRecord",
       "fields": [
         {"name": "streetaddress", "type": "string"},
         {"name": "city", "type": "string"},
         {"name": "state", "type": "string"},
         {"name": "zip", "type": "string"}
       ]}
    }
  ]
}
like image 744
derdc Avatar asked Mar 26 '15 14:03

derdc


People also ask

What is type in Avro schema?

type. Identifies the JSON field type. For Avro schemas, this must always be record when it is specified at the schema's top level. The type record means that there will be multiple fields defined. namespace.

What is Avro union type?

A union indicates that a field might have more than one data type. For example, a union might indicate that a field can be a string or a null. A union is represented as a JSON array containing the data types.


2 Answers

There are 4 possible ways:

  1. Including it in pom file as mentioned in this ticket.
  2. Declare all your types in a single avsc file.
  3. Using a single static parser that first parses all the imports and then parse the actual data types.
  4. (This is a hack) Use avdl file and use imports like https://avro.apache.org/docs/1.7.7/idl.html#imports . Though, IDL is intended for RPC calls.

Example for 2. Declare all your types in a single avsc file. Also answers array declaration on address.

[
{
    "type": "record",
    "namespace": "com.company.model",
    "name": "AddressRecord",
    "fields": [
        {
            "name": "streetaddress",
            "type": "string"
        },
        {
            "name": "city",
            "type": "string"
        },
        {
            "name": "state",
            "type": "string"
        },
        {
            "name": "zip",
            "type": "string"
        }
    ]
},
{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer",
    "fields": [
        {
            "name": "firstname",
            "type": "string"
        },
        {
            "name": "lastname",
            "type": "string"
        },
        {
            "name": "email",
            "type": "string"
        },
        {
            "name": "phone",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
},
{
    "namespace": "com.company.model",
    "type": "record",
    "name": "Customer2",
    "fields": [
        {
            "name": "x",
            "type": "string"
        },
        {
            "name": "y",
            "type": "string"
        },
        {
            "name": "address",
            "type": {
                "type": "array",
                "items": "com.company.model.AddressRecord"
            }
        }
    ]
}
]

Example for 3. Using a single static parser

Parser parser = new Parser(); // Make this static and reuse
parser.parse(<location of address.avsc file>);
parser.parse(<location of customer.avsc file>);
parser.parse(<location of customer2.avsc file>);

If we want a hold of the Schema, that is if we want to create new records, we can either do https://avro.apache.org/docs/1.5.4/api/java/org/apache/avro/Schema.Parser.html#getTypes() method to get the schema or

Parser parser = new Parser(); // Make this static and reuse
Schema addressSchema =parser.parse(<location of address.avsc file>);
Schema customerSchema=parser.parse(<location of customer.avsc file>);
Schema customer2Schema =parser.parse(<location of customer2.avsc file>); 
like image 147
Princey James Avatar answered Sep 26 '22 22:09

Princey James


Other add to @Princey James

With the Example for 2. Declare all your types in a single avsc file.

It will work for Serializing and deserializing with code generation

but Serializing and deserializing without code generation is not working

you will get org.apache.avro.AvroRuntimeException: Not a record schema: [{"type":" ...

working example with code generation :

  @Test
  public void avroWithCode() throws IOException {

    UserPerso UserPerso3 = UserPerso.newBuilder()
                                    .setName("Charlie")
                                    .setFavoriteColor("blue")
                                    .setFavoriteNumber(null)
                                    .build();

    AddressRecord adress = AddressRecord.newBuilder()
                                        .setStreetaddress("mo")
                                        .setCity("Paris")
                                        .setState("IDF")
                                        .setZip("75")
                                        .build();

    ArrayList<AddressRecord> li = new ArrayList<>();
    li.add(adress);

    Customer cust = Customer.newBuilder()
                            .setUser(UserPerso3)
                            .setPhone("0101010101")
                            .setAddress(li)
                            .build();

    String fileName = "cust.avro";

    File a = new File(fileName);

    DatumWriter<Customer> customerDatumWriter = new SpecificDatumWriter<>(Customer.class);
    DataFileWriter<Customer> dataFileWriter = new DataFileWriter<>(customerDatumWriter);
    dataFileWriter.create(cust.getSchema(), new File(fileName));
    dataFileWriter.append(cust);
    dataFileWriter.close();

    DatumReader<Customer> custDatumReader = new SpecificDatumReader<>(Customer.class);
    DataFileReader<Customer> dataFileReader = new DataFileReader<>(a, custDatumReader);
    Customer cust2 = null;
    while (dataFileReader.hasNext()) {
      cust2 = dataFileReader.next(cust2);
      System.out.println(cust2);
    }
  }

without :

  @Test
  public void avroWithoutCode() throws IOException {

    Schema schemaUserPerso = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));
    Schema schemaAdress = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));
    Schema schemaCustomer = new Schema.Parser().parse(new File("src/main/resources/avroTest/user.avsc"));

    System.out.println(schemaUserPerso);

    GenericRecord UserPerso3 = new GenericData.Record(schemaUserPerso);
    UserPerso3.put("name", "Charlie");
    UserPerso3.put("favorite_color", "blue");
    UserPerso3.put("favorite_number", null);

    GenericRecord adress = new GenericData.Record(schemaAdress);

    adress.put("streetaddress", "mo");
    adress.put("city", "Paris");
    adress.put("state", "IDF");
    adress.put("zip", "75");

    ArrayList<GenericRecord> li = new ArrayList<>();
    li.add(adress);

    GenericRecord cust = new GenericData.Record(schemaCustomer);

    cust.put("user", UserPerso3);
    cust.put("phone", "0101010101");
    cust.put("address", li);

    String fileName = "cust.avro";

    File file = new File(fileName);

    DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<>(schemaCustomer);
    DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<>(datumWriter);
    dataFileWriter.create(schemaCustomer, file);
    dataFileWriter.append(cust);
    dataFileWriter.close();

    File a = new File(fileName);

    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schemaCustomer);
    DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(a, datumReader);
    GenericRecord cust2 = null;
    while (dataFileReader.hasNext()) {
      cust2 = dataFileReader.next(cust2);
      System.out.println(cust2);

    }
  }
like image 23
raphaelauv Avatar answered Sep 22 '22 22:09

raphaelauv