Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark fail when running pi.py example with yarn-client mode

Tags:

apache-spark

I can successfully run the java version of pi example as follows.

./bin/spark-submit --class org.apache.spark.examples.SparkPi \ 
    --master yarn-client \ 
    --num-executors 3 \ 
    --driver-memory 4g \ 
    --executor-memory 2g \ 
    --executor-cores 1 \ 
    --queue thequeue \ 
    lib/spark-examples*.jar \ 
    10 

However, the python version failed with the following error information. I used yarn-client mode. The pyspark command line with yarn-client mode returned the same info. Can anyone help me to figure out this problem?

nlp@yyy2:~/spark$ ./bin/spark-submit --master yarn-client examples/src/main/python/pi.py 
15/01/05 17:22:26 INFO spark.SecurityManager: Changing view acls to: nlp 
15/01/05 17:22:26 INFO spark.SecurityManager: Changing modify acls to: nlp 
15/01/05 17:22:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp) 
15/01/05 17:22:26 INFO slf4j.Slf4jLogger: Slf4jLogger started 
15/01/05 17:22:26 INFO Remoting: Starting remoting 
15/01/05 17:22:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@yyy2:42747] 
15/01/05 17:22:26 INFO util.Utils: Successfully started service 'sparkDriver' on port 42747. 
15/01/05 17:22:26 INFO spark.SparkEnv: Registering MapOutputTracker 
15/01/05 17:22:26 INFO spark.SparkEnv: Registering BlockManagerMaster 
15/01/05 17:22:26 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150105172226-aeae 
15/01/05 17:22:26 INFO storage.MemoryStore: MemoryStore started with capacity 265.1 MB 
15/01/05 17:22:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
15/01/05 17:22:27 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-cbe0079b-79c5-426b-b67e-548805423b11 
15/01/05 17:22:27 INFO spark.HttpServer: Starting HTTP Server 
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT 
15/01/05 17:22:27 INFO server.AbstractConnector: Started [email protected]:57169 
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'HTTP file server' on port 57169. 
15/01/05 17:22:27 INFO server.Server: jetty-8.y.z-SNAPSHOT 
15/01/05 17:22:27 INFO server.AbstractConnector: Started [email protected]:4040 
15/01/05 17:22:27 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 
15/01/05 17:22:27 INFO ui.SparkUI: Started SparkUI at http://yyy2:4040
15/01/05 17:22:27 INFO client.RMProxy: Connecting to ResourceManager at yyy14/10.112.168.195:8032 
15/01/05 17:22:27 INFO yarn.Client: Requesting a new application from cluster with 6 NodeManagers 
15/01/05 17:22:27 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container) 
15/01/05 17:22:27 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 
15/01/05 17:22:27 INFO yarn.Client: Setting up container launch context for our AM 
15/01/05 17:22:27 INFO yarn.Client: Preparing resources for our AM container 
15/01/05 17:22:28 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 24 for xxx on ha-hdfs:hzdm-cluster1 
15/01/05 17:22:28 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/lib/spark-assembly-1.2.0-hadoop2.5.2.jar -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/spark-assembly-1.2.0-hadoop2.5.2.jar 
15/01/05 17:22:29 INFO yarn.Client: Uploading resource file:/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py -> hdfs://hzdm-cluster1/user/nlp/.sparkStaging/application_1420444011562_0023/pi.py 
15/01/05 17:22:29 INFO yarn.Client: Setting up the launch environment for our AM container 
15/01/05 17:22:29 INFO spark.SecurityManager: Changing view acls to: nlp 
15/01/05 17:22:29 INFO spark.SecurityManager: Changing modify acls to: nlp 
15/01/05 17:22:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(nlp); users with modify permissions: Set(nlp) 
15/01/05 17:22:29 INFO yarn.Client: Submitting application 23 to ResourceManager 
15/01/05 17:22:30 INFO impl.YarnClientImpl: Submitted application application_1420444011562_0023 
15/01/05 17:22:31 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:31 INFO yarn.Client: 
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  } 
         diagnostics: N/A 
         ApplicationMaster host: N/A 
         ApplicationMaster RPC port: -1 
         queue: root.default 
         start time: 1420449749969 
         final status: UNDEFINED 
         tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
         user: nlp 
15/01/05 17:22:32 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:33 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:34 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:35 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:36 INFO yarn.Client: Application report for application_1420444011562_0023 (state: ACCEPTED) 
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: ApplicationMaster registered as Actor[akka.tcp://sparkYarnAM@yyy16:52855/user/YarnAM#435880073] 
15/01/05 17:22:36 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> yyy14, PROXY_URI_BASES -> http://yyy14:8070/proxy/application_1420444011562_0023), /proxy/application_1420444011562_0023 
15/01/05 17:22:36 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 
15/01/05 17:22:37 INFO yarn.Client: Application report for application_1420444011562_0023 (state: RUNNING) 
15/01/05 17:22:37 INFO yarn.Client: 
         client token: Token { kind: YARN_CLIENT_TOKEN, service:  } 
         diagnostics: N/A 
         ApplicationMaster host: yyy16 
         ApplicationMaster RPC port: 0 
         queue: root.default 
         start time: 1420449749969 
         final status: UNDEFINED 
         tracking URL: http://yyy14:8070/proxy/application_1420444011562_0023/
         user: nlp 
15/01/05 17:22:37 INFO cluster.YarnClientSchedulerBackend: Application application_1420444011562_0023 has started running. 
15/01/05 17:22:37 INFO netty.NettyBlockTransferService: Server created on 35648 
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Trying to register BlockManager 
15/01/05 17:22:37 INFO storage.BlockManagerMasterActor: Registering block manager yyy2:35648 with 265.1 MB RAM, BlockManagerId(<driver>, yyy2, 35648) 
15/01/05 17:22:37 INFO storage.BlockManagerMaster: Registered BlockManager 
15/01/05 17:22:37 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkYarnAM@yyy16:52855] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 
15/01/05 17:22:38 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED! 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null} 
15/01/05 17:22:38 INFO handler.ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null} 
15/01/05 17:22:38 INFO ui.SparkUI: Stopped Spark web UI at http://yyy2:4040
15/01/05 17:22:38 INFO scheduler.DAGScheduler: Stopping DAGScheduler 
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 
15/01/05 17:22:38 INFO cluster.YarnClientSchedulerBackend: Stopped 
15/01/05 17:22:39 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor stopped! 
15/01/05 17:22:39 INFO storage.MemoryStore: MemoryStore cleared 
15/01/05 17:22:39 INFO storage.BlockManager: BlockManager stopped 
15/01/05 17:22:39 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 
15/01/05 17:22:39 INFO spark.SparkContext: Successfully stopped SparkContext 
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 
15/01/05 17:22:39 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down. 
15/01/05 17:22:57 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms) 
Traceback (most recent call last): 
  File "/home/nlp/platform/spark-1.2.0-bin-2.5.2/examples/src/main/python/pi.py", line 29, in <module>
    sc = SparkContext(appName="PythonPi") 
  File "/home/nlp/spark/python/pyspark/context.py", line 105, in __init__ 
    conf, jsc) 
  File "/home/nlp/spark/python/pyspark/context.py", line 153, in _do_init 
    self._jsc = jsc or self._initialize_context(self._conf._jconf) 
  File "/home/nlp/spark/python/pyspark/context.py", line 201, in _initialize_context 
    return self._jvm.JavaSparkContext(jconf) 
  File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__ 
  File "/home/nlp/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value 
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. 
: java.lang.NullPointerException 
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:497) 
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) 
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
        at java.lang.reflect.Constructor.newInstance(Constructor.java:408) 
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) 
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) 
        at py4j.Gateway.invoke(Gateway.java:214) 
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) 
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) 
        at py4j.GatewayConnection.run(GatewayConnection.java:207) 
        at java.lang.Thread.run(Thread.java:745)
like image 772
Feng Avatar asked Jan 06 '15 05:01

Feng


2 Answers

If you're running this example on Java 8, this may be due to Java 8's excessive memory allocation strategy: https://issues.apache.org/jira/browse/YARN-4714

You can force YARN to ignore this by setting up the following properties in yarn-site.xml

<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>
like image 131
simpleJack Avatar answered Nov 15 '22 07:11

simpleJack


Try with deploy mode parameter, like this:

--deploy-mode cluster

I had problem like your, with this parameter it worked.

like image 27
Robson Ventura Rodrigues Avatar answered Nov 15 '22 09:11

Robson Ventura Rodrigues