3600 seconds timeout that spark worker communicating with spark driver in heartbeater

Tags:

I did not configure any timeout value but used default settings. Where to configure 3600 seconds timeout? How to solve it?

Error message:

18/01/10 13:51:44 WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [3600 seconds]. This timeout is controlled by spark.executor.heartbeatInterval
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
    at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:738)
    at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:767)
    at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:767)
    at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:767)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1948)
    at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:767)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [3600 seconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    ... 14 more

667

asked Jan 12 '18 03:01

John

1 Answers

In the error message it says:

This timeout is controlled by spark.executor.heartbeatInterval

Hence, the first thing you try is increasing this value. It can be done in multiple ways, for example increasing the value to 10000 seconds:

When using spark-submit simply add the flag:

--conf spark.executor.heartbeatInterval=10000s

You can add a line in spark-defaults.conf:
```
spark.executor.heartbeatInterval 10000s
```

When creating a new SparkSession in your program, add a config parameter (Scala):

val spark = SparkSession.builder
  .config("spark.executor.heartbeatInterval", "10000s")
  .getOrCreate()

If this does not help, it could be a good idea to try increasing the value of spark.network.timeout as well. It is also a common source for problem related to these types of timeouts.

answered Sep 20 '22 12:09

Shaido

Related questions
                            
                                Why do character arrays accept non ASCII characters in C++?
                            
                                UnsatisfiedLinkError while building Tensorflow Lite demo source code
                            
                                How to get one hot encoding of specific words in a text in Pandas?
                            
                                Secure method to download files in Ruby
                            
                                how do i replay macros in ideavim like regular vim?
                            
                                truststore vs keystore in layman terms
                            
                                Java 9, Hibernate and java.sql/javax.transaction
                            
                                RabbitMQ security design to declare queues from server (and use from client)
                            
                                Parse Server, MongoDB - get "liked" state of an object
                            
                                Java: Is there a way to put nested classes in a separate file?
                            
                                What do I use as hostname for MX records? - Mailgun
                            
                                How do I enable Code Coverage for .NET Standard Library (2.0) project in VSTS (CI)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With