Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build Spark from the sources from the Download Spark page?

I tried to install and build Spark 2.0.0 on Ubuntu VM with Ubuntu 16.04 as follows:

  1. Install Java

    sudo apt-add-repository ppa:webupd8team/java
    sudo apt-get update       
    sudo apt-get install oracle-java8-installer
    
  2. Install Scala

    Go to their Downloads tab on their site: scala-lang.org/download/all.html

    I used Scala 2.11.8.

    sudo mkdir /usr/local/src/scala
    sudo tar -xvf scala-2.11.8.tgz -C /usr/local/src/scala/
    

    Modify the .bashrc file and include the path for scala:

    export SCALA_HOME=/usr/local/src/scala/scala-2.11.8
    export PATH=$SCALA_HOME/bin:$PATH
    

    then type:

    . .bashrc
    
  3. Install git

    sudo apt-get install git
    
  4. Download and build spark

    Go to: http://spark.apache.org/downloads.html

    Download Spark 2.0.0 (Build from Source - for standalone mode).

    tar -xvf spark-2.0.0.tgz
    cd into the Spark folder (that has been extracted).
    

    now type:

    ./build/sbt assembly
    

    After its done Installing, I get the message:

    [success] Total time: 1940 s, completed...

    followed by date and time...

  5. Run Spark shell

    bin/spark-shell
    

That's when all hell breaks loose and I start getting the error. I go into the assembly folder to look for a folder called target. But there's no such folder there. The only things visible in assembly are: pom.xml, README, and src.

I looked it up online for quite a while and I haven't been able to find a single concrete solution that would help solve the error. Can someone please provide explicit step-by-step instructions as to how to go about solving this ?!? It's driving me nuts now... (T.T)

Screenshot of the error:

enter image description here

like image 432
Michael Westen Avatar asked Sep 01 '16 23:09

Michael Westen


People also ask

How do you implement spark?

The most common way to launch spark applications on the cluster is to use the shell command spark-submit. When using spark-submit shell command the spark application need not be configured particularly for each cluster as the spark-submit shell script uses the cluster managers through a single interface.

How do I run spark locally?

It's easy to run locally on one machine — all you need is to have java installed on your system PATH , or the JAVA_HOME environment variable pointing to a Java installation. Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+. Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.


2 Answers

For some reason, Scala 2.11.8 is not working well while building but if I switch over to Scala 2.10.6 then it builds properly. I guess the reason I would need Scala in the first place is to get access to sbt to be able to build spark. Once its built, I need to direct myself to the spark folder and type:

build/sbt package

This will build the missing JAR files for me using Scala 2.11... kinda weird but that's how its working (I am assuming by looking at the logs).

Once spark builds again, type: bin/spark-shell (while being in the spark folder) and you'll have access to the spark shell.

like image 160
Michael Westen Avatar answered Sep 19 '22 06:09

Michael Westen


type sbt package in spark directory not in build directory.

like image 35
Jay Avatar answered Sep 17 '22 06:09

Jay