Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set up Spark on Windows?

I am trying to setup Apache Spark on Windows.

After searching a bit, I understand that the standalone mode is what I want. Which binaries do I download in order to run Apache spark in windows? I see distributions with hadoop and cdh at the spark download page.

I don't have references in web to this. A step by step guide to this is highly appreciated.

like image 958
Siva Avatar asked Aug 25 '14 07:08

Siva


People also ask

How do I use spark on Windows?

Spark with winutils.exe on WindowsTo run Apache Spark on windows, you need winutils.exe as it uses POSIX like file access operations in windows using windows API. winutils.exe enables Spark to use Windows-specific services including running shell commands on a windows environment.

Where is spark PATH in Windows?

Step 5: Open command prompt and go to your spark bin folder (type cd C:\Users\Desktop\A\spark\bin). Type spark-shell. It will show some warnings and errors but ignore. It works.


2 Answers

Steps to install Spark in local mode:

  1. Install Java 7 or later. To test java installation is complete, open command prompt type java and hit enter. If you receive a message 'Java' is not recognized as an internal or external command. You need to configure your environment variables, JAVA_HOME and PATH to point to the path of jdk.

  2. Download and install Scala.

    Set SCALA_HOME in Control Panel\System and Security\System goto "Adv System settings" and add %SCALA_HOME%\bin in PATH variable in environment variables.

  3. Install Python 2.6 or later from Python Download link.

  4. Download SBT. Install it and set SBT_HOME as an environment variable with value as <<SBT PATH>>.

  5. Download winutils.exe from HortonWorks repo or git repo. Since we don't have a local Hadoop installation on Windows we have to download winutils.exe and place it in a bin directory under a created Hadoop home directory. Set HADOOP_HOME = <<Hadoop home directory>> in environment variable.

  6. We will be using a pre-built Spark package, so choose a Spark pre-built package for Hadoop Spark download. Download and extract it.

    Set SPARK_HOME and add %SPARK_HOME%\bin in PATH variable in environment variables.

  7. Run command: spark-shell

  8. Open http://localhost:4040/ in a browser to see the SparkContext web UI.

like image 176
Ani Menon Avatar answered Sep 26 '22 13:09

Ani Menon


I found the easiest solution on Windows is to build from source.

You can pretty much follow this guide: http://spark.apache.org/docs/latest/building-spark.html

Download and install Maven, and set MAVEN_OPTS to the value specified in the guide.

But if you're just playing around with Spark, and don't actually need it to run on Windows for any other reason that your own machine is running Windows, I'd strongly suggest you install Spark on a linux virtual machine. The simplest way to get started probably is to download the ready-made images made by Cloudera or Hortonworks, and either use the bundled version of Spark, or install your own from source or the compiled binaries you can get from the spark website.

like image 22
jkgeyti Avatar answered Sep 26 '22 13:09

jkgeyti