spark 0.9.1 on hadoop 2.2.0 maven dependency

Question

I set up Apache Spark maven dependency in pom.xml as follows

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>0.9.1</version>
    </dependency>

But I found that this dependency use "hadoop-client-1.0.4.jar" and "hadoop-core-1.0.4.jar", and when I run my program I got the error "org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4", which shows that I need to switch hadoop version from 1.0.4 to 2.2.0.

Updates:

Is the following solution a correct method to solve this problem?

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>0.9.1</version>
        <exclusions>
            <exclusion> 
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-core</artifactId>
            </exclusion>
            <exclusion> 
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
            </exclusion>
        </exclusions> 
    </dependency> 
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.2.0</version>
    </dependency>

Many thanks for your help.

dranxo · Accepted Answer

Recompile Spark for your Hadoop version, see "A Note About Hadoop Versions" here: http://spark.apache.org/docs/0.9.1/ . They conveniently give an example for 2.2.0

SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly

This will create a new jar, $SPARK_HOME/assembly/target/scala-2.10/spark-assembly-*jar that you need to include into your pom.xml (instead of excluding Hadoop from the online jar).

If you're already hosting your own repository (e.g. on Nexus) then upload it there (this is what I do and it works great). If for some reason you can't upload to any repository, use Maven's install:install-file or one of the answers here Maven: add a dependency to a jar by relative path

Tobber · Answer

Spark 1.2.0 depends on hadoop 2.2.0 be default. If you can update your spark dependency to 1.2.0 (or newer) that will solve the problem.

spark 0.9.1 on hadoop 2.2.0 maven dependency

Tags:

java

maven

apache-spark

hadoop

faustineinsun

2 Answers

dranxo

Tobber

Recent Activity

Donate For Us

spark 0.9.1 on hadoop 2.2.0 maven dependency

Tags:

java

maven

apache-spark

hadoop

faustineinsun

2 Answers

dranxo

Tobber

Related questions

Recent Activity

Donate For Us