Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to add a new service with Cloudera Manager within Cloudera Quickstart VM 5.3.0

I'm using Cloudera Quickstart VM 5.3.0 (running in Virtual Box 4.3 on Windows 7) and I wanted to learn Spark (on YARN).

I started Cloudera Manager. In the sidebar I can see all the services, there is Spark but in standalone mode. So I click on "Add a new service", select "Spark". Then I have to select the set of dependencies for this service, I have no choices I must pick HDFS/YARN/zookeeper. Next step I have to choose a History Server and a Gateway, I run the VM in local mode so I can only choose localhost.

I click on "Continue" and this error occures (+ 69 traces) :

A server error as occurred. Send the following information to Cloudera.

Path : http://localhost:7180/cmf/clusters/1/add-service/reviewConfig

Version: Cloudera Express 5.3.0 (#155 built by jenkins on 20141216-1458 git: e9aae1d1d1ce2982d812b22bd1c29ff7af355226)

org.springframework.web.bind.MissingServletRequestParameterException:Required long parameter 'serviceId' is not present at AnnotationMethodHandlerAdapter.java line 738 in org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter$ServletHandlerMethodInvoker raiseMissingParameterException()

I don't know if an internet connection is needed but I precise that I can't connect to the internet with the VM. (EDIT : Even with an internet connection I get the same error)

I have no ideas how to add this service, I tried with or without gateway, many network options but it never worked. I checked the known issues; nothing...

Someone knows how I can solve this error or how I can work around ? Thanks for any help.

like image 798
Julien Navarre Avatar asked Oct 31 '22 07:10

Julien Navarre


1 Answers

Julien,

Before I answer your question I'd like to make some general notes about Spark in Cloudera Distribution of Hadoop 5 (CDH5):

  1. Spark runs in three different formats: (1) local, (2) Spark's own stand-alone manager, and (3) other cluster resource managers like Hadoop YARN, Apache Mesos, and Amazon EC2.
  2. Spark works out-of-the-box with CHD 5 for (1) and (2). You can initiate a local interactive spark session in Scala using the spark-shell command or pyspark for Python without passing any arguments. I find the interactive Scala and Python interpreters help learning to program with Resilient Distributed Datasets (RDDs).

I was able to recreate your error on my CDH 5.3.x distribution. I didn't mean to take credit for the bug you discovered, but I posted to the Cloudera developer community for feedback.

In order to use Spark in the QuickStart pseudo-distributed environment, see if all of the Spark daemons are running using the following command (you can do this inside the Cloudera Manager (CM) UI):

[cloudera@quickstart simplesparkapp]$ sudo service --status-all | grep -i spark
Spark history-server is not running                        [FAILED]
Spark master is not running                                [FAILED]
Spark worker is not running                                [FAILED]

I've manually stopped all of the stand-alone Spark services so we can try to submit the Spark job within Yarn.

In order to run Spark inside a Yarn container on the quick start cluster, we have to do the following:

  1. Set the HADOOP_CONF_DIR to the root of the directory containing the yarn-site.xml configuration file. This is typically /etc/hadoop/conf in CHD5. You can set this variable using the command export HADOOP_CONF_DIR="/etc/hadoop/conf".
  2. Submit the job using spark-submit and specify you are using Hadoop YARN.

    spark-submit --class CLASS_PATH --master yarn JAR_DIR ARGS

  3. Check the job status in Hue and compare to the Spark History server. Hue should show the job placed in a generic Yarn container and Spark History should not have a record of the submitted job.
References used:
  • Learning Spark, Chapter 7
  • Sandy Ryza's Blog Post on Spark and CDH5
  • Spark Documentation for Running on Yarn
like image 92
Myles Baker Avatar answered Nov 15 '22 11:11

Myles Baker