I am facing these errors while running a spark job in standalone cluster mode.
My spark job aims at:
groupby
,count
,joins
to get a final df
and then df.toPandas().to_csv()
. Input dataset is of 524 Mb. Error I get:
WARN TaskMemoryManager: Failed to allocate a page (33554432 bytes), try again.
After multiple times repeating the above , again new error
WARN NettyRpcEnv: Ignored failure: java.util.concurrent.TimeoutException: Cannot receive any reply in 10 seconds
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
at org.apache.spark.rpc.RpcTimeout. org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException
ExecutorLostFailure (executor driver exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 158295 ms
Exception happened during processing of request from ('127.0.0.1', 49128) Traceback (most recent call last):
File "/home/stp/spark-2.0.0-bin-hadoop2.7/python/pyspark/accumulators.py", line 235, in handle num_updates = read_int(self.rfile) File "/home/stp/spark-2.0.0-bin-hadoop2.7/python/pyspark/serializers.py", line 545, in read_int raise EOFError EOFError
At last ###********##
py4j.protocol.Py4JNetworkError: An error occurred while trying to connect to the Java server (127.0.0.1:38073)
On first thought , i assumed, the error might be due to memory error(TaskMemoryManager) and from Total 16gb, the process was consuming max 6 gb, leaving 9+gb free. Also I had set the driver memory as 10G. so Pass.
But, when I do a count() or show() on my final dataframe, it was a success op. But while doing toCsv, it is throwing the above errors/Warnings.
Don't actually understand/guess what might be causing the issue.
Please help me analyzing the above errors. Any help/comment is welcome. Thanks.
In our case, we had lot of smaller tables (< 10 MB). So we decided to disable the broadcast and in addition to that started using G1GC for garbage collection. Add these entries to you spark-defaults.conf file in $SPARK_HOME/conf
spark.driver.extraJavaOptions -XX:+UseG1GC
spark.executor.extraJavaOptions -XX:+UseG1GC
spark.sql.autoBroadcastJoinThreshold -1
Or as an alternative you can adjust the threshold size for autoBroadcast and see it that solves the issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With