Installing Apache Spark on Ubuntu 14.04

Tags:

At first I have a VM to which I access via ubuntu, and this VM is also Ubuntu 14.04. I need to install Apache Spark as soon as possible, but I can not find anything which can help me or give me references where it's best explained. I tried once to install it on my local machine Ubuntu 14.04 but it failed , but the thing is that I don't want to install it on a cluster. Any help please???

879

asked May 27 '15 13:05

JPerk

1 Answers

This post explains detailed steps to set up Apache Spark-2.0 in Ubuntu/Linux machine. For running Spark in Ubuntu machine should have Java and Scala installed. Spark can be installed with or without Hadoop, here in this post we will be dealing with only installing Spark 2.0 Standalone. Installing Spark-2.0 over Hadoop is explained in another post. We will also be doing how to install Jupyter notebooks for running Spark applications using Python with pyspark module. So, let’s start by checking and installing java and scala.

$ scala -version
$ java –version

These commands should print you the versions if scala and java is already installed else you can go to installing these by using following commands.

$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
$ wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz
$ sudo mkdir /usr/local/src/scala
$ sudo tar xvf scala-2.10.4.tgz -C /usr/local/scala/

You can again check by using –version commands if java and scala is installed properly which will display – Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL and for java it should display java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b14, mixed mode) And update the .bashrc file by adding these lines at the end.

export SCALA_HOME=/usr/local/scala/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH

And restart bashrc by using this command

$ . .bashrc

Installing Spark First Download Spark from https://spark.apache.org/downloads.html using these options Spark Realease : 2.0.0 Package Type: prebuilt with Hadoop 2.7 and Direct download.

Now, got to $HOME/Downloads and use following command to extract the spark tar file and move to the given location.

$ `tar xvf spark-1.3.1-bin-hadoop2.6.tgz`
$ `cd $HOME/Downloads/` 
$ mv spark-2.0.0-bin-hadoop2.7 /usr/local/spark

Add the following line to ~/.bashrc file. It means adding the location, where the spark software file are located to the PATH variable.

export SPARK_HOME=/usr/local/spark
export PATH =$SPARK_HOME/bin:$PATH

Again restart the environment .bashrc by using these commands source ~/.bashrc or

. .bashrc

Now you can start spark-shell by using these commands

$spark-shell    for starting scala API
$ pyspark       for starting Python API

150

answered Oct 19 '22 02:10

Abir J.

Related questions
                            
                                Spark dataframe write method writing many small files
                            
                                Spark structured streaming kafka convert JSON without schema (infer schema)
                            
                                Class com.hadoop.compression.lzo.LzoCodec not found for Spark on CDH 5?
                            
                                Specifying an external configuration file for Apache Spark
                            
                                PySpark 1.5 How to Truncate Timestamp to Nearest Minute from seconds
                            
                                Spark 1.6-Failed to locate the winutils binary in the hadoop binary path
                            
                                Spark - Random Number Generation
                            
                                Could not bind on a random free port error while trying to connect to spark master
                            
                                EntityTooLarge error when uploading a 5G file to Amazon S3
                            
                                How to get ID of a map task in Spark?
                            
                                pyspark matrix with dummy variables
                            
                                Spark column string replace when present in other column (row)
                            
                                Converting a Spark Dataframe to a Scala Map collection
                            
                                How to change the column type from String to Date in DataFrames?
                            
                                Remove rows from dataframe based on condition in pyspark
                            
                                Matrix Transpose on RowMatrix in Spark
                            
                                PySpark computing correlation
                            
                                How to update column based on a condition (a value in a group)?
                            
                                AuthorizationException: User not allowed to impersonate User
                            
                                How to CROSS JOIN 2 dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Installing Apache Spark on Ubuntu 14.04

Tags:

virtual-machine

ubuntu-14.04

apache-spark

JPerk

People also ask

1 Answers

Abir J.

Recent Activity

Donate For Us