Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to configure Apache Livy to run with Spark Standalone?

On the machine which I installed Apache Livy (on Ubuntu 16.04):

(a) Is it possible to run it on Spark Standalone mode?

I am thinking of using Spark 1.6.3, Pre-built for Hadoop 2.6, downloadable from https://spark.apache.org/downloads.html

(b) If yes, how do I configure it?

(c) What should the HADOOP_CONF_DIR be for Spark Standalone? The link https://github.com/cloudera/livy mentioned the following environment variables:

export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf

I have successfully build Livy except the last task, which is pending on Spark installation:

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] livy-api ........................................... SUCCESS [  9.984 s]
[INFO] livy-client-common ................................. SUCCESS [  6.681 s]
[INFO] livy-test-lib ...................................... SUCCESS [  0.647 s]
[INFO] livy-rsc ........................................... SUCCESS [01:08 min]
[INFO] livy-core_2.10 ..................................... SUCCESS [  7.225 s]
[INFO] livy-repl_2.10 ..................................... SUCCESS [02:42 min]
[INFO] livy-core_2.11 ..................................... SUCCESS [ 56.400 s]
[INFO] livy-repl_2.11 ..................................... SUCCESS [03:06 min]
[INFO] livy-server ........................................ SUCCESS [02:12 min]
[INFO] livy-assembly ...................................... SUCCESS [ 15.959 s]
[INFO] livy-client-http ................................... SUCCESS [ 25.377 s]
[INFO] livy-scala-api_2.10 ................................ SUCCESS [ 40.336 s]
[INFO] livy-scala-api_2.11 ................................ SUCCESS [ 40.991 s]
[INFO] minicluster-dependencies_2.10 ...................... SUCCESS [ 24.400 s]
[INFO] minicluster-dependencies_2.11 ...................... SUCCESS [  5.489 s]
[INFO] livy-integration-test .............................. SUCCESS [ 37.473 s]
[INFO] livy-coverage-report ............................... SUCCESS [  3.062 s]
[INFO] livy-examples ...................................... SUCCESS [  6.841 s]
[INFO] livy-python-api .................................... FAILURE [  8.053 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13:59 min
[INFO] Finished at: 2016-11-29T13:14:10-08:00
[INFO] Final Memory: 76M/2758M
[INFO] ------------------------------------------------------------------------

Thank you.

like image 587
Joshua G Avatar asked Nov 29 '16 22:11

Joshua G


2 Answers

For future reference follow this steps : (Ubuntu)

Here is the detailed steps you need to follow :

  1. Install JDK 8
  2. Install Spark (spark-2.4.5-bin-hadoop2.7.tgz) OR (spark-2.4.5-bin-without-hadoop-scala-2.12.tgz)
  3. Install Livy (apache-livy-0.7.0-incubating-bin.zip)
  4. Add Variables .bashrc :

    export JAVA_HOME="/lib/jvm/jdk1.8.0_251" export PATH=$PATH:$JAVA_HOME/bin

    export SPARK_HOME=/opt/hadoop/spark-2.4.5-bin-hadoop2.7 export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

    export LIVY_HOME=/opt/hadoop/apache-livy-0.7.0-incubating-bin export PATH=$PATH:$LIVY_HOME/bin

    export HADOOP_CONF_DIR=/etc/hadoop/conf <--- (Optional)

  5. At $LIVY_HOME, we need to make folder named "logs" and give permissions to it else error will show up when we start "livy-server".

  6. Now start start-master.sh (is present Spark's sbin folder)

  7. Now start start-slave.sh (master-url can be obtained from after doing step 6 and going to localhost:8080)
  8. Now Livy's bin folder has "livy-server" ,Just started it.
  9. Now livy.s UI can be accessed from localhost:8998

10.Many rest endpoints are there : https://livy.incubator.apache.org/docs/latest/rest-api.html

11.If you are interested in running JAR, so use Batch instad of session.

12.Create simple application for spark where conf's master is passed as argument for making it dynamic(so that you can pass master url)

  1. Try these versions to work with the spark version you installed (if you installed spark-2.4.5-bin-hadoop2.7.tgz) :

    scalaVersion := "2.11.12"

    libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.5" libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.5"

  2. JDK 8 is a must. (JDK 11 causes trouble for scala 2.11.12 and spark 2.4.5)

  3. Now normal spark-submit code if I keep my Jar file at Desktop is :

spark-submit --class com.company.Main file:///home/user_name/Desktop/scala_demo.jar spark://abhishek-desktop:7077

  1. For Livy its :

POST localhost:8998/batches

{
   "className": "com.company.Main",
   "executorMemory": "20g",
   "args": [
       "spark://abhishek-desktop:7077"
   ],
   "file": "local:/home/user_name/Desktop/scala_demo.jar"
}
  1. Executing above will return status as running, we just need to go localhost:8998 and check the log for the result.
like image 161
Abhishek Sengupta Avatar answered Nov 15 '22 08:11

Abhishek Sengupta


Could be a missing python module. Take a look at the failing log.

Traceback (most recent call last):
  File "setup.py", line 18, in <module>
    from setuptools import setup
ImportError: No module named setuptools

in this case, you need to install setuptools module.

pip install setuptools
Collecting setuptools
  Downloading https://files.pythonhosted.org/packages/20/d7/04a0b689d3035143e2ff288f4b9ee4bf6ed80585cc121c90bfd85a1a8c2e/setuptools-39.0.1-py2.py3-none-any.whl (569kB)
    100% |████████████████████████████████| 573kB 912kB/s 
Installing collected packages: setuptools
Successfully installed setuptools-20.7.0
like image 30
earandap Avatar answered Nov 15 '22 06:11

earandap