It looks like I am again stuck on the running a packaged spark app jar using spark submit. Following is my pom file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<parent>
<artifactId>oneview-forecaster</artifactId>
<groupId>com.dataxu.oneview.forecast</groupId>
<version>1.0.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>forecaster</artifactId>
<dependencies>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<!--<scope>provided</scope>-->
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.2.0</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.8.3</version>
<!--<scope>provided</scope>-->
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.10.60</version>
</dependency>
<!-- https://mvnrepository.com/artifact/joda-time/joda-time -->
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.9.9</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.0</version>
<!--<scope>provided</scope>-->
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>${scala-maven-plugin.version}</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>com.dataxu.oneview.forecaster.App</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
Following is a simple snippet of code which fetches data from s3 location and prints it:
def getS3Data(path: String): Map[String, Any] = {
println("spark session start.........")
val spark = getSparkSession()
val configTxt = spark.sparkContext.textFile(path)
.collect().reduce(_ + _)
val mapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
mapper.readValue(configTxt, classOf[Map[String, String]])
}
When I run it from intellij, everything works fine. the log is clear and looks good. However, when I package it using mvn package and try to run it using spark submit, I end up getting the following error at the .collect.reduce(_ + _)
. Following is the error I encounter:
"main" java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.reloadExistingConfigurations()V
at org.apache.hadoop.fs.s3a.S3AFileSystem.addDeprecatedKeys(S3AFileSystem.java:181)
at org.apache.hadoop.fs.s3a.S3AFileSystem.<clinit>(S3AFileSystem.java:185)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
...
I am not understanding which dependency was not packaged or what might be the issue as I did set the versions correctly expecting the hadoop aws should have all of them.
Any help will be appreciated.
The dependencies between hadoop and AWS JDK are very sensitive, and you should stick to using the correct versions that your hadoop dependency version was built with.
The first problem you need to solve is pick one version of Hadoop. I see you're mixing versions 2.8.3
and 2.8.0
.
When I look at the dependency tree for org.apache.hadoop:hadoop-aws:2.8.0
, I see that it is built against version 1.10.6
of the AWS SDK (same for hadoop-aws:2.8.3
).
This is probably what's causing mismatches (you're mixing incompatible versions). So:
hadoop-aws
with the version compatible with your hadoopIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With