Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I split an Apache Avro schema across multiple files?

Tags:

avro

I can do,

{     "type": "record",     "name": "Foo",     "fields": [         {"name": "bar", "type": {             "type": "record",             "name": "Bar",             "fields": [ ]         }}     ] } 

and that works fine, but supposing I want to split the schema up into two files such as:

{     "type": "record",     "name": "Foo",     "fields": [         {"name": "bar", "type": "Bar"}     ] }  {     "type": "record",     "name": "Bar",     "fields": [ ] } 

Does Avro have the capability to do this?

like image 480
Owen Avatar asked Feb 03 '14 22:02

Owen


People also ask

Do Avro files contain schema?

Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages; one for human editing (Avro IDL) and another which is more machine-readable based on JSON.

Does Avro schema order matter?

Avro serializer/deserializers operate on fields in the order they are declared. Producers and Consumers must be on a compatible schema including the field order. Do not change the order of AVRO fields. All Producers and Consumers are must be updated at the same time if you change the field order.


2 Answers

Yes, it's possible.

I've done that in my java project by defining common schema files in avro-maven-plugin Example:

search_result.avro:

{     "namespace": "com.myorg.other",     "type": "record",     "name": "SearchResult",     "fields": [         {"name": "type", "type": "SearchResultType"},         {"name": "keyWord",  "type": "string"},         {"name": "searchEngine", "type": "string"},         {"name": "position", "type": "int"},         {"name": "userAction", "type": "UserAction"}     ] } 

search_suggest.avro:

{     "namespace": "com.myorg.other",     "type": "record",     "name": "SearchSuggest",     "fields": [         {"name": "suggest", "type": "string"},         {"name": "request",  "type": "string"},         {"name": "searchEngine", "type": "string"},         {"name": "position", "type": "int"},         {"name": "userAction", "type": "UserAction"},         {"name": "timestamp", "type": "long"}     ] } 

user_action.avro:

{     "namespace": "com.myorg.other",     "type": "enum",     "name": "UserAction",     "symbols": ["S", "V", "C"] } 

search_result_type.avro

{     "namespace": "com.myorg.other",     "type": "enum",     "name": "SearchResultType",     "symbols": ["O", "S", "A"] } 

avro-maven-plugin configuration:

<plugin>     <groupId>org.apache.avro</groupId>     <artifactId>avro-maven-plugin</artifactId>     <version>1.7.4</version>     <executions>         <execution>             <phase>generate-sources</phase>             <goals>                 <goal>schema</goal>             </goals>             <configuration>                 <sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>                 <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>                 <includes>                     <include>**/*.avro</include>                 </includes>                 <imports>                     <import>${project.basedir}/src/main/resources/avro/user_action.avro</import>                     <import>${project.basedir}/src/main/resources/avro/search_result_type.avro</import>                 </imports>             </configuration>         </execution>     </executions> </plugin> 
like image 127
AlexTiunov Avatar answered Sep 17 '22 21:09

AlexTiunov


You can also define multiple schemas inside of one file:

schemas.avsc:

[ {     "type": "record",     "name": "Bar",     "fields": [ ] }, {     "type": "record",     "name": "Foo",     "fields": [         {"name": "bar", "type": "Bar"}     ] } ] 

If you want to reuse the schemas in multiple places this is not super nice but it improves readability and maintainability a lot in my opinion.

like image 39
Michael Avatar answered Sep 20 '22 21:09

Michael