I set up Apache Spark maven dependency in pom.xml as follows
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>0.9.1</version>
</dependency>
But I found that this dependency use "hadoop-client-1.0.4.jar" and "hadoop-core-1.0.4.jar", and when I run my program I got the error "org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4", which shows that I need to switch hadoop version from 1.0.4 to 2.2.0.
Updates:
Is the following solution a correct method to solve this problem?
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>0.9.1</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0</version>
</dependency>
Many thanks for your help.
Recompile Spark for your Hadoop version, see "A Note About Hadoop Versions" here: http://spark.apache.org/docs/0.9.1/ . They conveniently give an example for 2.2.0
SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
This will create a new jar, $SPARK_HOME/assembly/target/scala-2.10/spark-assembly-*jar
that you need to include into your pom.xml (instead of excluding Hadoop from the online jar).
If you're already hosting your own repository (e.g. on Nexus) then upload it there (this is what I do and it works great). If for some reason you can't upload to any repository, use Maven's install:install-file
or one of the answers here Maven: add a dependency to a jar by relative path
Spark 1.2.0 depends on hadoop 2.2.0 be default. If you can update your spark dependency to 1.2.0 (or newer) that will solve the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With