Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use proto3 with Hadoop/Spark?

I've got several .proto files which rely on syntax = "proto3";. I also have a Maven project that is used to build Hadoop/Spark jobs (Hadoop 2.7.1 and Spark 1.5.2). I'd like to generate data in Hadoop/Spark and then serialize it according to my proto3 files.

Using libprotoc 3.0.0, I generate Java sources which work fine within my Maven project as long as I have the following in my pom.xml:

<dependency>
  <groupId>com.google.protobuf</groupId>
  <artifactId>protobuf-java</artifactId>
  <version>3.0.0-beta-1</version>
</dependency>  

Now, when I use my libprotoc-generated classes in a job that gets deployed to a cluster I get hit with:

java.lang.VerifyError : class blah overrides final method mergeUnknownFields.(Lcom/google/protobuf/UnknownFieldSet;)Lcom/google/protobuf/GeneratedMessage$Builder;
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:760)

ClassLoader failing seems reasonable given that Hadoop/Spark have a dependency on protobuf-java 2.5.0 which is incompatible with my 3.0.0-beta-1. I also noticed that protobufs (presumably versions < 3) have found their way into my jar in a few other places:

$ jar tf target/myjar-0.1-SNAPSHOT.jar | grep protobuf | grep '/$'
org/apache/hadoop/ipc/protobuf/
org/jboss/netty/handler/codec/protobuf/
META-INF/maven/com.google.protobuf/
META-INF/maven/com.google.protobuf/protobuf-java/
org/apache/mesos/protobuf/
io/netty/handler/codec/protobuf/
com/google/protobuf/
google/protobuf/

Is there something I can do (Maven Shade?) to sort this out?

Similar issue here: Spark java.lang.VerifyError

like image 346
dranxo Avatar asked Dec 28 '15 05:12

dranxo


2 Answers

Turns out this kinda thing is documented here: https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html

Just need to relocate the protobuffers and the VerifyError goes away:

          <relocations>
            <relocation>
              <pattern>com.google.protobuf</pattern>
              <shadedPattern>shaded.com.google.protobuf</shadedPattern>
            </relocation>
          </relocations>
like image 191
dranxo Avatar answered Oct 24 '22 16:10

dranxo


Same solution as Dranxo's but with sbt assembly

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("com.google.protobuf.*" -> "shadedproto.@1").inProject
    .inLibrary("com.google.protobuf" % "protobuf-java" % protobufVersion)
)
like image 38
jrabary Avatar answered Oct 24 '22 16:10

jrabary