Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: Could not find CoarseGrainedScheduler

Am not sure what's causing this exception running my Spark job after running for some few hours.

Am running Spark 2.0.2

Any debugging tip ?

2016-12-27 03:11:22,199 [shuffle-server-3] ERROR org.apache.spark.network.server.TransportRequestHandler - Error while invoking RpcHandler#receive() for one-way message.
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
    at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154)
    at org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
    at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:571)
    at org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180)
    at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEve
like image 430
Adetiloye Philip Kehinde Avatar asked Dec 27 '16 03:12

Adetiloye Philip Kehinde


4 Answers

Yeah now I know the meaning of that cryptic exception, the executor got killed because it exceeds the container memory threshold.
There are couple of reasons that could happen but the first culprit is to check your job (e.g. repartition) or try adding more nodes/executors to your cluster.

like image 148
Adetiloye Philip Kehinde Avatar answered Nov 12 '22 15:11

Adetiloye Philip Kehinde


Basically it means that there is another reason for the failure. Try to find other exception in your job logs.

See "Exceptions" sections here: https://medium.com/@wx.london.cun/spark-on-yarn-f74e82ab6070

like image 32
Tomer Avatar answered Nov 12 '22 14:11

Tomer


It could be a resource problem. Try to increase the number of cores and executor and also to assign more RAM to the application then you should increase the partition number of your RDD by calling a repartition. The ideal number of partitions depends on previous settings. Hope this helps.

like image 6
Beniamino Del Pizzo Avatar answered Nov 12 '22 16:11

Beniamino Del Pizzo


Another silly reason could be that your time in spark streaming awaitTermination is set to much less time, and it got terminated before completing

ssc.awaitTermination(timeout)
@param timeout: time to wait in seconds
like image 1
Sathish Avatar answered Nov 12 '22 15:11

Sathish