Spark off heap memory leak on Yarn with Kafka direct stream

Question

I am running spark streaming 1.4.0 on Yarn (Apache distribution 2.6.0) with java 1.8.0_45 and also Kafka direct stream. I am also using spark with scala 2.11 support.

The issue I am seeing is that both driver and executor containers are gradually increasing the physical memory usage till a point where yarn container kill it. I have configured upto 192M Heap and 384 off heap space in my driver but it eventually runs out of it

The Heap memory appears to be fine with regular GC cycles. There is no OutOffMemory encountered ever in any such runs

Infact I am not generating any traffic on the kafka queues still this happens. Here is the code I am using

object SimpleSparkStreaming extends App {

val conf = new SparkConf()
val ssc = new StreamingContext(conf,Seconds(conf.getLong("spark.batch.window.size",1L)));
ssc.checkpoint("checkpoint")
val topics = Set(conf.get("spark.kafka.topic.name")); 
    val kafkaParams = Map[String, String]("metadata.broker.list" -> conf.get("spark.kafka.broker.list"))
            val kafkaStream = KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](ssc, kafkaParams, topics)
            kafkaStream.foreachRDD(rdd => {
                rdd.foreach(x => {
                    println(x._2)
                })

            })
    kafkaStream.print()
            ssc.start() 

            ssc.awaitTermination()

}

I am running this on CentOS 7. The command used for spark submit is following

./bin/spark-submit --class com.rasa.cloud.prototype.spark.SimpleSparkStreaming \
--conf spark.yarn.executor.memoryOverhead=256 \
--conf spark.yarn.driver.memoryOverhead=384 \
--conf spark.kafka.topic.name=test \
--conf spark.kafka.broker.list=172.31.45.218:9092 \
--conf spark.batch.window.size=1 \
--conf spark.app.name="Simple Spark Kafka application" \
--master yarn-cluster \
--num-executors 1 \
--driver-memory 192m \
--executor-memory 128m \
--executor-cores 1 \
/home/centos/spark-poc/target/lib/spark-streaming-prototype-0.0.1-SNAPSHOT.jar

Any help is greatly appreciated

Regards,

Apoorva

Praneeth Reddy G · Accepted Answer

Try increasing executor cores. In your example the only core is dedicated for consuming the streaming data, leaving no cores to process in the incoming data.

Spark off heap memory leak on Yarn with Kafka direct stream

Tags:

apache-spark

hadoop-yarn

spark-streaming

apache-spark-1.4

Apoorva Sareen

1 Answers

Praneeth Reddy G

Recent Activity

Donate For Us

Spark off heap memory leak on Yarn with Kafka direct stream

Tags:

apache-spark

hadoop-yarn

spark-streaming

apache-spark-1.4

Apoorva Sareen

1 Answers

Praneeth Reddy G

Related questions

Recent Activity

Donate For Us