I tried to install and build Spark 2.0.0 on Ubuntu VM with Ubuntu 16.04 as follows:
Install Java
sudo apt-add-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
Install Scala
Go to their Downloads tab on their site: scala-lang.org/download/all.html
I used Scala 2.11.8.
sudo mkdir /usr/local/src/scala
sudo tar -xvf scala-2.11.8.tgz -C /usr/local/src/scala/
Modify the .bashrc
file and include the path for scala:
export SCALA_HOME=/usr/local/src/scala/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
then type:
. .bashrc
Install git
sudo apt-get install git
Download and build spark
Go to: http://spark.apache.org/downloads.html
Download Spark 2.0.0 (Build from Source - for standalone mode).
tar -xvf spark-2.0.0.tgz
cd into the Spark folder (that has been extracted).
now type:
./build/sbt assembly
After its done Installing, I get the message:
[success] Total time: 1940 s, completed...
followed by date and time...
Run Spark shell
bin/spark-shell
That's when all hell breaks loose and I start getting the error. I go into the assembly folder to look for a folder called target. But there's no such folder there. The only things visible in assembly are: pom.xml, README, and src.
I looked it up online for quite a while and I haven't been able to find a single concrete solution that would help solve the error. Can someone please provide explicit step-by-step instructions as to how to go about solving this ?!? It's driving me nuts now... (T.T)
Screenshot of the error:
The most common way to launch spark applications on the cluster is to use the shell command spark-submit. When using spark-submit shell command the spark application need not be configured particularly for each cluster as the spark-submit shell script uses the cluster managers through a single interface.
It's easy to run locally on one machine — all you need is to have java installed on your system PATH , or the JAVA_HOME environment variable pointing to a Java installation. Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+. Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.
For some reason, Scala 2.11.8 is not working well while building but if I switch over to Scala 2.10.6 then it builds properly. I guess the reason I would need Scala in the first place is to get access to sbt to be able to build spark. Once its built, I need to direct myself to the spark folder and type:
build/sbt package
This will build the missing JAR files for me using Scala 2.11... kinda weird but that's how its working (I am assuming by looking at the logs).
Once spark builds again, type: bin/spark-shell (while being in the spark folder) and you'll have access to the spark shell.
type sbt package in spark directory not in build directory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With