On the machine which I installed Apache Livy (on Ubuntu 16.04):
(a) Is it possible to run it on Spark Standalone mode?
I am thinking of using Spark 1.6.3, Pre-built for Hadoop 2.6, downloadable from https://spark.apache.org/downloads.html
(b) If yes, how do I configure it?
(c) What should the HADOOP_CONF_DIR be for Spark Standalone? The link https://github.com/cloudera/livy mentioned the following environment variables:
export SPARK_HOME=/usr/lib/spark
export HADOOP_CONF_DIR=/etc/hadoop/conf
I have successfully build Livy except the last task, which is pending on Spark installation:
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] livy-api ........................................... SUCCESS [ 9.984 s]
[INFO] livy-client-common ................................. SUCCESS [ 6.681 s]
[INFO] livy-test-lib ...................................... SUCCESS [ 0.647 s]
[INFO] livy-rsc ........................................... SUCCESS [01:08 min]
[INFO] livy-core_2.10 ..................................... SUCCESS [ 7.225 s]
[INFO] livy-repl_2.10 ..................................... SUCCESS [02:42 min]
[INFO] livy-core_2.11 ..................................... SUCCESS [ 56.400 s]
[INFO] livy-repl_2.11 ..................................... SUCCESS [03:06 min]
[INFO] livy-server ........................................ SUCCESS [02:12 min]
[INFO] livy-assembly ...................................... SUCCESS [ 15.959 s]
[INFO] livy-client-http ................................... SUCCESS [ 25.377 s]
[INFO] livy-scala-api_2.10 ................................ SUCCESS [ 40.336 s]
[INFO] livy-scala-api_2.11 ................................ SUCCESS [ 40.991 s]
[INFO] minicluster-dependencies_2.10 ...................... SUCCESS [ 24.400 s]
[INFO] minicluster-dependencies_2.11 ...................... SUCCESS [ 5.489 s]
[INFO] livy-integration-test .............................. SUCCESS [ 37.473 s]
[INFO] livy-coverage-report ............................... SUCCESS [ 3.062 s]
[INFO] livy-examples ...................................... SUCCESS [ 6.841 s]
[INFO] livy-python-api .................................... FAILURE [ 8.053 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 13:59 min
[INFO] Finished at: 2016-11-29T13:14:10-08:00
[INFO] Final Memory: 76M/2758M
[INFO] ------------------------------------------------------------------------
Thank you.
For future reference follow this steps : (Ubuntu)
Here is the detailed steps you need to follow :
Add Variables .bashrc :
export JAVA_HOME="/lib/jvm/jdk1.8.0_251" export PATH=$PATH:$JAVA_HOME/bin
export SPARK_HOME=/opt/hadoop/spark-2.4.5-bin-hadoop2.7 export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export LIVY_HOME=/opt/hadoop/apache-livy-0.7.0-incubating-bin export PATH=$PATH:$LIVY_HOME/bin
export HADOOP_CONF_DIR=/etc/hadoop/conf <--- (Optional)
At $LIVY_HOME, we need to make folder named "logs" and give permissions to it else error will show up when we start "livy-server".
Now start start-master.sh (is present Spark's sbin folder)
10.Many rest endpoints are there : https://livy.incubator.apache.org/docs/latest/rest-api.html
11.If you are interested in running JAR, so use Batch instad of session.
12.Create simple application for spark where conf's master is passed as argument for making it dynamic(so that you can pass master url)
Try these versions to work with the spark version you installed (if you installed spark-2.4.5-bin-hadoop2.7.tgz) :
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.5" libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.5"
JDK 8 is a must. (JDK 11 causes trouble for scala 2.11.12 and spark 2.4.5)
Now normal spark-submit code if I keep my Jar file at Desktop is :
spark-submit --class com.company.Main file:///home/user_name/Desktop/scala_demo.jar spark://abhishek-desktop:7077
POST localhost:8998/batches
{
"className": "com.company.Main",
"executorMemory": "20g",
"args": [
"spark://abhishek-desktop:7077"
],
"file": "local:/home/user_name/Desktop/scala_demo.jar"
}
Could be a missing python module. Take a look at the failing log.
Traceback (most recent call last):
File "setup.py", line 18, in <module>
from setuptools import setup
ImportError: No module named setuptools
in this case, you need to install setuptools module.
pip install setuptools
Collecting setuptools
Downloading https://files.pythonhosted.org/packages/20/d7/04a0b689d3035143e2ff288f4b9ee4bf6ed80585cc121c90bfd85a1a8c2e/setuptools-39.0.1-py2.py3-none-any.whl (569kB)
100% |████████████████████████████████| 573kB 912kB/s
Installing collected packages: setuptools
Successfully installed setuptools-20.7.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With