Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V

It looks like I am again stuck on the running a packaged spark app jar using spark submit. Following is my pom file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <parent>
        <artifactId>oneview-forecaster</artifactId>
        <groupId>com.dataxu.oneview.forecast</groupId>
        <version>1.0.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>
    <artifactId>forecaster</artifactId>

<dependencies>
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
    </dependency>
    <dependency>
        <groupId>com.fasterxml.jackson.module</groupId>
        <artifactId>jackson-module-scala_${scala.binary.version}</artifactId>
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.binary.version}</artifactId>
        <version>${spark.version}</version>
        <!--<scope>provided</scope>-->
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.2.0</version>
        <!--<scope>provided</scope>-->
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-aws</artifactId>
        <version>2.8.3</version>
        <!--<scope>provided</scope>-->
    </dependency>
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk</artifactId>
        <version>1.10.60</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/joda-time/joda-time -->
    <dependency>
        <groupId>joda-time</groupId>
        <artifactId>joda-time</artifactId>
        <version>2.9.9</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.8.0</version>
        <!--<scope>provided</scope>-->
    </dependency>
</dependencies>

<build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
        <plugin>
            <groupId>net.alchim31.maven</groupId>
            <artifactId>scala-maven-plugin</artifactId>
            <version>${scala-maven-plugin.version}</version>
            <executions>
                <execution>
                    <goals>
                        <goal>compile</goal>
                        <goal>testCompile</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass>com.dataxu.oneview.forecaster.App</mainClass>
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

Following is a simple snippet of code which fetches data from s3 location and prints it:

def getS3Data(path: String): Map[String, Any] = {
    println("spark session start.........")
    val spark =  getSparkSession()

    val configTxt = spark.sparkContext.textFile(path)
        .collect().reduce(_ + _)

    val mapper = new ObjectMapper
    mapper.registerModule(DefaultScalaModule)
    mapper.readValue(configTxt, classOf[Map[String, String]])
}

When I run it from intellij, everything works fine. the log is clear and looks good. However, when I package it using mvn package and try to run it using spark submit, I end up getting the following error at the .collect.reduce(_ + _). Following is the error I encounter:

 "main" java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V
at org.apache.hadoop.fs.s3a.S3AFileSystem.addDeprecatedKeys(S3AFileSystem.java:181)
at org.apache.hadoop.fs.s3a.S3AFileSystem.<clinit>(S3AFileSystem.java:185)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
...

I am not understanding which dependency was not packaged or what might be the issue as I did set the versions correctly expecting the hadoop aws should have all of them.

Any help will be appreciated.

like image 420
Omkar Avatar asked Mar 07 '18 18:03

Omkar


1 Answers

The dependencies between hadoop and AWS JDK are very sensitive, and you should stick to using the correct versions that your hadoop dependency version was built with.

The first problem you need to solve is pick one version of Hadoop. I see you're mixing versions 2.8.3 and 2.8.0.

When I look at the dependency tree for org.apache.hadoop:hadoop-aws:2.8.0, I see that it is built against version 1.10.6 of the AWS SDK (same for hadoop-aws:2.8.3).

maven dependency tree

This is probably what's causing mismatches (you're mixing incompatible versions). So:

  • Choose the version of hadoop you want to use
  • Include hadoop-aws with the version compatible with your hadoop
  • Remove other dependencies, or only include them with versions matching the one compatible with your hadoop version.
like image 104
ernest_k Avatar answered Nov 15 '22 08:11

ernest_k