Spark (Kafka) Streaming Memory Issue

Question

I am testing my first Spark Streaming pipline which processes messages from Kafka. However, after several testing runs, I got the following error message There is insufficient memory for the Java Runtime Environment to continue.

My testing data is really small thus this should not happen. After looking into the process, I realized maybe previously submitted spark jobs were not removed completely? enter image description here

I usually submit jobs like below, and I am using Spark 2.2.1 /usr/local/spark/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 ~/script/to/spark_streaming.py

And stop it using `Ctrl+C'

Last few lines of the script looks like:

ssc.start()
ssc.awaitTermination()

Update

After I changing the way to submit a spark streaming job (command like below), I still ran into same issue which is after killing the job, memory will not be released.I only started Hadoop and Spark for those 4 EC2 nodes.

/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 --py-files ~/config.py --master spark://<master_IP>:7077 --deploy-mode client  ~/spark_kafka.py

rustyx · Accepted Answer

When you press Ctrl-C, only the submitter process is interrupted, the job itself continues to run. Eventually your system runs out of memory so no new JVM can be started.

Furthermore, even if you restart the cluster, all previously running jobs will be restarted again.

Read how to stop a running Spark application properly.

Spark (Kafka) Streaming Memory Issue

Tags:

java

out-of-memory

apache-kafka

apache-spark

Update

TTT

1 Answers

rustyx

Recent Activity

Donate For Us

Spark (Kafka) Streaming Memory Issue

Tags:

java

out-of-memory

apache-kafka

apache-spark

Update

TTT

1 Answers

rustyx

Related questions

Recent Activity

Donate For Us