Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark 0.9.1 on hadoop 2.2.0 maven dependency

I set up Apache Spark maven dependency in pom.xml as follows

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>0.9.1</version>
    </dependency>

But I found that this dependency use "hadoop-client-1.0.4.jar" and "hadoop-core-1.0.4.jar", and when I run my program I got the error "org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4", which shows that I need to switch hadoop version from 1.0.4 to 2.2.0.

Updates:

Is the following solution a correct method to solve this problem?

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>0.9.1</version>
        <exclusions>
            <exclusion> 
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-core</artifactId>
            </exclusion>
            <exclusion> 
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
            </exclusion>
        </exclusions> 
    </dependency> 
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.2.0</version>
    </dependency> 

Many thanks for your help.

like image 618
faustineinsun Avatar asked Oct 21 '22 07:10

faustineinsun


2 Answers

Recompile Spark for your Hadoop version, see "A Note About Hadoop Versions" here: http://spark.apache.org/docs/0.9.1/ . They conveniently give an example for 2.2.0

SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly

This will create a new jar, $SPARK_HOME/assembly/target/scala-2.10/spark-assembly-*jar that you need to include into your pom.xml (instead of excluding Hadoop from the online jar).

If you're already hosting your own repository (e.g. on Nexus) then upload it there (this is what I do and it works great). If for some reason you can't upload to any repository, use Maven's install:install-file or one of the answers here Maven: add a dependency to a jar by relative path

like image 199
dranxo Avatar answered Oct 23 '22 04:10

dranxo


Spark 1.2.0 depends on hadoop 2.2.0 be default. If you can update your spark dependency to 1.2.0 (or newer) that will solve the problem.

like image 44
Tobber Avatar answered Oct 23 '22 06:10

Tobber