Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Avro: map uses CharSequence as key

Tags:

java

avro

I am using Apache Avro.

My schema has map type:

{"name": "MyData", 
  "type" :  {"type": "map", 
              "values":{
                   "type": "record",
                   "name": "Person",
                   "fields":[
                      {"name": "name", "type": "string"},
                      {"name": "age", "type": "int"},

                ]
                }
               }
}

After compile the schema, the genated Java class use CharSequence as the key for the Map MyData.

It is very inconvenient to use CharSequence in Map as key, is there a way to generate String type key for Map in Apache Avro?

P.S.

Problem is that, for example dataMap.containsKey("SOME_KEY") will returns false even though there is such key there, just because it is CharSequence. Besides, put an map entry with a existing key doesn't relpace the old one. That's why I say it is inconvenient to use CharSequence as key.

like image 866
Mellon Avatar asked Nov 01 '13 14:11

Mellon


5 Answers

This JIRA discussion is relevant. The main point of CharSequence still being used is backwards-compatability.

And like Charles Forsythe pointed out, there has been added a workaround for when String is necessary, by setting the string property in the schema.

 { "type": "string", "avro.java.string": "String" }

The default type here is their own Utf8 class. In addition to manual specification and the pom.xml setting, there is even an avro-tools compile option for it, the -string option:

java -jar avro-tools.1.7.5.jar compile -string schema /path/to/schema .
like image 73
Alex A. Avatar answered Oct 06 '22 03:10

Alex A.


Apparently, there is a workaround for this problem in Avro 1.6. You specify the string type in your project's POM file:

  <stringType>String</stringType>

This is mentioned in this issue is AVRO-803 ... though the plugin's web documentation doesn't reflect this.

like image 28
Stephen C Avatar answered Oct 06 '22 04:10

Stephen C


Apparently, by default, Avro uses CharSequence. I found a way to configure it to convert to String

From Avro 1.6.0 onward, there is an option to have Avro always perform the conversion to String. There are a couple of ways to achieve this. The first is to set the avro.java.string property in the schema to String:

         { "type": "string", "avro.java.string": "String" }

I have not tested this.

like image 36
Charles Forsythe Avatar answered Oct 06 '22 04:10

Charles Forsythe


I think explicitly convert String to Utf8 will work. "some_key" -> new Utf8("some_key") and use this as your key for the map.

like image 37
Jun Avatar answered Oct 06 '22 02:10

Jun


Regardless of whether it's possible to force Avro to use a String, using CharSequence directly is a bad implementation because CharSequence isn't Comparable<CharSequence> and doesn't even specify equality of two identical sequences. I suggest filing this as a bug against Avro.

like image 39
chrylis -cautiouslyoptimistic- Avatar answered Oct 06 '22 04:10

chrylis -cautiouslyoptimistic-