OutofMemoryErrory creating fat jar with sbt assembly

Question

We are trying to make a fat jar file containing one small scala source file and a ton of dependencies (simple mapreduce example using spark and cassandra):

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._
import org.apache.spark.SparkConf

object VMProcessProject {

    def main(args: Array[String]) {
        val conf = new SparkConf()
            .set("spark.cassandra.connection.host", "127.0.0.1")
             .set("spark.executor.extraClassPath", "C:\Users\SNCUser\dataquest\ScalaProjects\lib\spark-cassandra-connector-assembly-1.3.0-M2-SNAPSHOT.jar")
        println("got config")
        val sc = new SparkContext("spark://US-L15-0027:7077", "test", conf)
        println("Got spark context")

        val rdd = sc.cassandraTable("test_ks", "test_col")

        println("Got RDDs")

        println(rdd.count())

        val newRDD = rdd.map(x => 1)
        val count1 = newRDD.reduce((x, y) => x + y)

    }
}

We do not have a build.sbt file, instead putting jars into a lib folder and source files in the src/main/scala directory and running with sbt run. Our assembly.sbt file looks as follows:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")

When we run sbt assembly we get the following error message:

...
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: java heap space
    at java.util.concurrent...

We're not sure how to change the jvm settings to increase the memory since we are using sbt assembly to make the jar. Also, if there is something egregiously wrong with how we are writing the code or building our project that'd help us out a lot too; there's been so many headaches trying to set up a basic spark program!

Wesley Miao · Accepted Answer

sbt is essentially a java process. You can try to tune your sbt runtime heap size for the OutOfMemory issues.

For 0.13.x, the default memory options sbt uses is

-Xms1024m -Xmx1024m -XX:ReservedCodeCacheSize=128m -XX:MaxPermSize=256m.

And you can enlarge the heap size by doing something like

sbt -J-Xms2048m -J-Xmx2048m assembly

Rdesmond · Answer

I was including spark as an unmanaged dependency (putting the jar file in the lib folder) which used a lot of memory because it is a huge jar.

Instead, I made a build.sbt file which included spark as a provided, unmanaged dependency.
Secondly, I created the environment variable JAVA_OPTS with the value -Xms256m -Xmx4g, which sets the minimum heap size to 256 megabytes, while allowing the heap to grow to a maximum size of 4 gigabytes. These two combined allowed me to create a jar file with sbt assembly

More info on provided dependencies:

https://github.com/sbt/sbt-assembly

Ken Zhu · Answer

I met the issue before. For my env, set Java_ops doesn't work. I use below command and it works.

set SBT_OPTS="-Xmx4G"
sbt assembly

There is no issue of out of memeory.

OutofMemoryErrory creating fat jar with sbt assembly

Tags:

jar

cassandra

sbt

apache-spark

Rdesmond

3 Answers

Wesley Miao

Rdesmond

Ken Zhu

Recent Activity

Donate For Us

OutofMemoryErrory creating fat jar with sbt assembly

Tags:

jar

cassandra

sbt

apache-spark

Rdesmond

3 Answers

Wesley Miao

Rdesmond

Ken Zhu

Related questions

Recent Activity

Donate For Us