We are trying to make a fat jar file containing one small scala source file and a ton of dependencies (simple mapreduce example using spark and cassandra):
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._
import org.apache.spark.SparkConf
object VMProcessProject {
def main(args: Array[String]) {
val conf = new SparkConf()
.set("spark.cassandra.connection.host", "127.0.0.1")
.set("spark.executor.extraClassPath", "C:\\Users\\SNCUser\\dataquest\\ScalaProjects\\lib\\spark-cassandra-connector-assembly-1.3.0-M2-SNAPSHOT.jar")
println("got config")
val sc = new SparkContext("spark://US-L15-0027:7077", "test", conf)
println("Got spark context")
val rdd = sc.cassandraTable("test_ks", "test_col")
println("Got RDDs")
println(rdd.count())
val newRDD = rdd.map(x => 1)
val count1 = newRDD.reduce((x, y) => x + y)
}
}
We do not have a build.sbt file, instead putting jars into a lib folder and source files in the src/main/scala directory and running with sbt run. Our assembly.sbt file looks as follows:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
When we run sbt assembly we get the following error message:
...
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: java heap space
at java.util.concurrent...
We're not sure how to change the jvm settings to increase the memory since we are using sbt assembly to make the jar. Also, if there is something egregiously wrong with how we are writing the code or building our project that'd help us out a lot too; there's been so many headaches trying to set up a basic spark program!
sbt is essentially a java process. You can try to tune your sbt runtime heap size for the OutOfMemory issues.
For 0.13.x, the default memory options sbt uses is
-Xms1024m -Xmx1024m -XX:ReservedCodeCacheSize=128m -XX:MaxPermSize=256m
.
And you can enlarge the heap size by doing something like
sbt -J-Xms2048m -J-Xmx2048m assembly
I was including spark as an unmanaged dependency (putting the jar file in the lib folder) which used a lot of memory because it is a huge jar.
Instead, I made a build.sbt
file which included spark as a provided, unmanaged dependency.
Secondly, I created the environment variable JAVA_OPTS
with the value -Xms256m -Xmx4g
, which sets the minimum heap size to 256 megabytes, while allowing the heap to grow to a maximum size of 4 gigabytes. These two combined allowed me to create a jar file with sbt assembly
More info on provided dependencies:
https://github.com/sbt/sbt-assembly
I met the issue before. For my env, set Java_ops doesn't work. I use below command and it works.
There is no issue of out of memeory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With