Pyspark: TaskMemoryManager: Failed to allocate a page: Need help in Error Analysis

Question

I am facing these errors while running a spark job in standalone cluster mode.

My spark job aims at:

Running some groupby,
count,
and joins to get a final df and then df.toPandas().to_csv().

Input dataset is of 524 Mb. Error I get:

WARN TaskMemoryManager: Failed to allocate a page (33554432 bytes), try again.

After multiple times repeating the above , again new error

WARN NettyRpcEnv: Ignored failure: java.util.concurrent.TimeoutException: Cannot receive any reply in 10 seconds
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout. org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException
ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 158295 ms
Exception happened during processing of request from ('127.0.0.1', 49128) Traceback (most recent call last):

File "/home/stp/spark-2.0.0-bin-hadoop2.7/python/pyspark/accumulators.py", line 235, in handle num_updates = read_int(self.rfile) File "/home/stp/spark-2.0.0-bin-hadoop2.7/python/pyspark/serializers.py", line 545, in read_int raise EOFError EOFError
At last ###********##

py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server (127.0.0.1:38073)

On first thought , i assumed, the error might be due to memory error(TaskMemoryManager) and from Total 16gb, the process was consuming max 6 gb, leaving 9+gb free. Also I had set the driver memory as 10G. so Pass.

But, when I do a count() or show() on my final dataframe, it was a success op. But while doing toCsv, it is throwing the above errors/Warnings.

Don't actually understand/guess what might be causing the issue.

Please help me analyzing the above errors. Any help/comment is welcome. Thanks.

user2608613 · Accepted Answer

In our case, we had lot of smaller tables (< 10 MB). So we decided to disable the broadcast and in addition to that started using G1GC for garbage collection. Add these entries to you spark-defaults.conf file in $SPARK_HOME/conf

spark.driver.extraJavaOptions -XX:+UseG1GC
spark.executor.extraJavaOptions  -XX:+UseG1GC
spark.sql.autoBroadcastJoinThreshold    -1

Or as an alternative you can adjust the threshold size for autoBroadcast and see it that solves the issue.

Pyspark: TaskMemoryManager: Failed to allocate a page: Need help in Error Analysis

Tags:

python

apache-spark

apache-spark-sql

pyspark

spark-dataframe

Satya

1 Answers

user2608613

Recent Activity

Donate For Us

Pyspark: TaskMemoryManager: Failed to allocate a page: Need help in Error Analysis

Tags:

python

apache-spark

apache-spark-sql

pyspark

spark-dataframe

Satya

1 Answers

user2608613

Related questions

Recent Activity

Donate For Us