Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apache spark: akka version error by build jar with all dependencies

i have build a jar file from my spark app with maven (mvn clean compile assembly:single) and the following pom file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>mgm.tp.bigdata</groupId>
  <artifactId>ma-spark</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>ma-spark</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <repositories>
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.1.0-cdh5.2.5</version>
    </dependency>
    <dependency>
        <groupId>mgm.tp.bigdata</groupId>
        <artifactId>ma-commons</artifactId>
        <version>0.0.1-SNAPSHOT</version>
    </dependency>
  </dependencies>

  <build>
  <plugins>
    <plugin>
      <artifactId>maven-assembly-plugin</artifactId>
      <configuration>
        <archive>
          <manifest>
            <mainClass>mgm.tp.bigdata.ma_spark.SparkMain</mainClass>
          </manifest>
        </archive>
        <descriptorRefs>
          <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
      </configuration>
    </plugin>
  </plugins>
</build>
</project>

if i run my app with java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar on terminal, i get the following error message:

VirtualBox:~/Schreibtisch$ java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2015-Jun-02 12:53:36,348 [main] org.apache.spark.util.Utils
 WARN  - Your hostname, proewer-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
2015-Jun-02 12:53:36,350 [main] org.apache.spark.util.Utils
 WARN  - Set SPARK_LOCAL_IP if you need to bind to another address
2015-Jun-02 12:53:36,401 [main] org.apache.spark.SecurityManager
 INFO  - Changing view acls to: proewer
2015-Jun-02 12:53:36,402 [main] org.apache.spark.SecurityManager
 INFO  - Changing modify acls to: proewer
2015-Jun-02 12:53:36,403 [main] org.apache.spark.SecurityManager
 INFO  - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(proewer); users with modify permissions: Set(proewer)
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
    at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:136)
    at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:470)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
    at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1454)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1450)
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:156)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:203)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
    at mgm.tp.bigdata.ma_spark.SparkMain.main(SparkMain.java:38)

what i do wrong?

best regards, paul

like image 817
Pa Rö Avatar asked Jun 02 '15 13:06

Pa Rö


4 Answers

This is what you are doing wrong :

i run my app with java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar

Once you have your application build, your should launch it using the spark-submit script. This script takes care of setting up the classpath with Spark and its dependencies, and can support different cluster managers and deploy modes that Spark supports:

./bin/spark-submit \
  --class <main-class>
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

I strongly advice your to read the official documentation about Submitting Application.

like image 145
eliasah Avatar answered Oct 30 '22 19:10

eliasah


It is most likely because the akka conf file from akka jar got overridden or missed while packaging the fat jar.

You can try another plug-in called maven-shade-plugin. And in the pom.xml you need to specify how to solve conflicts of resources with the same name. Below is an example -

             <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <minimizeJar>false</minimizeJar>
                            <createDependencyReducedPom>false</createDependencyReducedPom>
                            <artifactSet>
                                <includes>
                                    <!-- Include here the dependencies you want to be packed in your fat jar -->
                                    <include>my.package.etc....:*</include>
                                </includes>
                            </artifactSet>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

Please note the <transformers> section where it is instructing the shade plugin to append the content, instead of replacing.

like image 37
Wesley Miao Avatar answered Oct 30 '22 19:10

Wesley Miao


This worked for me.

 <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <version>1.5</version>
      <executions>
        <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <shadedArtifactAttached>true</shadedArtifactAttached>
              <shadedClassifierName>allinone</shadedClassifierName>
              <artifactSet>
                <includes>
                  <include>*:*</include>
                </includes>
              </artifactSet>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
          <transformers>
            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <resource>reference.conf</resource>
            </transformer>
            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <resource>META-INF/spring.handlers</resource>
            </transformer>
            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <resource>META-INF/spring.schemas</resource>
            </transformer>
            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <manifestEntries>
                <Main-Class>com.echoed.chamber.Main</Main-Class>
              </manifestEntries>
            </transformer>
          </transformers>
        </configuration>
          </execution>
        </executions>
    </plugin>
like image 24
Germán Avatar answered Oct 30 '22 20:10

Germán


ConfigException$Missing error indicates that akka config file i.e., reference.conf file is not bundled in application jar file. Reason could be that when there are multiple files available with the same name in different dependent jar's, default strategy will check to see if they all of are same. If not, then it'll omit that file.

I had the same issue and I resolved it as follows:

Generate merged reference.conf using AppendingTransformer: By merged reference.conf file, what I mean is that all the dependent modules such as akka-core, akka-http, akka-remoting etc containing resource named reference.conf are appended together by AppendingTransformer. We add AppendingTransformer in pom file as follows:

 <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
     <resource>reference.conf</resource>
 </transformer>

mvn clean install will now generate fat jar with merged reference.conf file.

Still same error: spark-submit <main-class> <app.jar> still gave the same error when i deployed my spark-app in EMR.

Reason: Since HDFS is the configured filesystem, Spark jobs on EMR cluster reads from HDFS by default. So, the file you want to use must already exist in HDFS. I've added reference.conf file to hdfs using following approach:

1. Extract reference.conf file from app.jar into /tmp folder
    `cd /tmp`
    `jar xvf path_to_application.jar reference.conf` 
2. Copy extracted reference.conf from local-path (in this case /tmp) to HDFS-path (ex: /user/hadoop)
    `hdfs dfs -put /tmp/reference.conf /user/hadoop`
3. Load config as follows:
   `val parsedConfig = ConfigFactory.parseFile(new File("/user/hadoop/reference.conf"))`                                   
   `val config = COnfigFactory.load(par)`   

Alternate solution:

  • Extract reference.conf file from app.jar file and copy it on all the nodes of EMR cluster at the same path for drivers and executors.
  • ConfigFactory.parseFile(new File(“file:///tmp/reference.conf”)) will now read reference.conf from local file system. Hope that helps and saves some debugging time for you guys!!
like image 24
Sruthi Poddutur Avatar answered Oct 30 '22 19:10

Sruthi Poddutur