Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running yarn with spark not working with Java 8

I have cluster with 1 master and 6 slaves which uses pre-built version of hadoop 2.6.0 and spark 1.6.2. I was running hadoop MR and spark jobs without any problem with openjdk 7 installed on all the nodes. However when I upgraded openjdk 7 to openjdk 8 on all nodes, spark submit and spark-shell with yarn caused error.

16/08/17 14:06:22 ERROR client.TransportClient: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelExce      ption
java.nio.channels.ClosedChannelException
16/08/17 14:06:22 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts
org.apache.spark.SparkException: Exception thrown in awaitResult
        at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
        at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
        at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
        at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
        at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
        at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply$m      cV$sp(YarnSchedulerBackend.scala:271)
        at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y      arnSchedulerBackend.scala:271)
        at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$1.apply(Y      arnSchedulerBackend.scala:271)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to send RPC 4688442384427245199 to /xxx.xxx.xxx.xx:42955: java.nio.channels.ClosedChannelException
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
        at io.netty.util.concurrent.DefaultPromise$LateListeners.run(DefaultPromise.java:845)
        at io.netty.util.concurrent.DefaultPromise$LateListenerNotifier.run(DefaultPromise.java:873)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        ... 1 more
Caused by: java.nio.channels.ClosedChannelException
16/08/17 14:06:22 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
        at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581)
        at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:549)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:211)
        at java.lang.Thread.run(Thread.java:745)
Traceback (most recent call last):
  File "/home/hd_spark/spark2/python/pyspark/shell.py", line 49, in <module>
    spark = SparkSession.builder.getOrCreate()
  File "/home/hd_spark/spark2/python/pyspark/sql/session.py", line 169, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/home/hd_spark/spark2/python/pyspark/context.py", line 294, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/home/hd_spark/spark2/python/pyspark/context.py", line 115, in __init__
    conf, jsc, profiler_cls)
  File "/home/hd_spark/spark2/python/pyspark/context.py", line 168, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/home/hd_spark/spark2/python/pyspark/context.py", line 233, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 1183, in __call__
  File "/home/hd_spark/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
        at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:581)
        at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:162)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:549)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:236)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:211)
        at java.lang.Thread.run(Thread.java:745)

I have exported JAVA_HOME on .bashrc and have set the openjdk 8 as default java using

sudo update-alternatives --config java
sudo update-alternatives --config javac

these commands. Also I have tried with oracle java 8 and the same error comes up. The container logs on the slave nodes have same error as below.

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/filecache/17/__spark_libs__8247267244939901627.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/17 14:05:11 INFO executor.CoarseGrainedExecutorBackend: Started daemon with process name: 23541@slave01
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for TERM
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for HUP
16/08/17 14:05:11 INFO util.SignalUtils: Registered signal handler for INT
16/08/17 14:05:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls to: hd_spark
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls to: hd_spark
16/08/17 14:05:11 INFO spark.SecurityManager: Changing view acls groups to:
16/08/17 14:05:11 INFO spark.SecurityManager: Changing modify acls groups to:
16/08/17 14:05:11 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hd_spark); groups with view permissions: Set(); users  with modify permissions: Set(hd_spark); groups with modify permissions: Set()
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 78 ms (0 ms spent in bootstraps)
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls to: hd_spark
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls to: hd_spark
16/08/17 14:05:12 INFO spark.SecurityManager: Changing view acls groups to:
16/08/17 14:05:12 INFO spark.SecurityManager: Changing modify acls groups to:
16/08/17 14:05:12 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hd_spark); groups with view permissions: Set(); users  with modify permissions: Set(hd_spark); groups with modify permissions: Set()
16/08/17 14:05:12 INFO client.TransportClientFactory: Successfully created connection to /xxx.xxx.xxx.xx:37417 after 1 ms (0 ms spent in bootstraps)
16/08/17 14:05:12 INFO storage.DiskBlockManager: Created local directory at /tmp/hadoop-hd_spark/nm-local-dir/usercache/hd_spark/appcache/application_1471352972661_0005/blockmgr-d9f23a56-1420-4cd4-abfd-ae9e128c688c
16/08/17 14:05:12 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
16/08/17 14:05:12 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: spark://[email protected]:37417
16/08/17 14:05:13 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
16/08/17 14:05:13 INFO storage.DiskBlockManager: Shutdown hook called
16/08/17 14:05:13 INFO util.ShutdownHookManager: Shutdown hook called

I have tried with spark 1.6.2 pre-built version, spark 2.0 pre-built version and also tried with spark 2.0 by building it myself.

Hadoop job works perfectly even after upgrading to java 8. When i switch back to java 7, spark works fine.

My scala version is 2.11 and OS is Ubuntu 14.04.4 LTS .

It will be very great if someone can give me an idea to solve this problem.

Thanks!

ps I have changed my IP address as xxx.xxx.xxx.xx on the logs.

like image 710
jmoa Avatar asked Aug 17 '16 05:08

jmoa


1 Answers

As of September 12, 2016, this is a blocker issue: https://issues.apache.org/jira/browse/YARN-4714

You can overcome this by setting up the following properties in yarn-site.xml

<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
like image 148
simpleJack Avatar answered Oct 26 '22 00:10

simpleJack