I am trying to setup Apache Spark on Windows.
After searching a bit, I understand that the standalone mode is what I want. Which binaries do I download in order to run Apache spark in windows? I see distributions with hadoop and cdh at the spark download page.
I don't have references in web to this. A step by step guide to this is highly appreciated.
Spark with winutils.exe on WindowsTo run Apache Spark on windows, you need winutils.exe as it uses POSIX like file access operations in windows using windows API. winutils.exe enables Spark to use Windows-specific services including running shell commands on a windows environment.
Step 5: Open command prompt and go to your spark bin folder (type cd C:\Users\Desktop\A\spark\bin). Type spark-shell. It will show some warnings and errors but ignore. It works.
Steps to install Spark in local mode:
Install Java 7 or later. To test java installation is complete, open command prompt type java
and hit enter. If you receive a message 'Java' is not recognized as an internal or external command.
You need to configure your environment variables, JAVA_HOME
and PATH
to point to the path of jdk.
Download and install Scala.
Set SCALA_HOME
in Control Panel\System and Security\System
goto "Adv System settings" and add %SCALA_HOME%\bin
in PATH variable in environment variables.
Install Python 2.6 or later from Python Download link.
Download SBT. Install it and set SBT_HOME
as an environment variable with value as <<SBT PATH>>
.
Download winutils.exe
from HortonWorks repo or git repo. Since we don't have a local Hadoop installation on Windows we have to download winutils.exe
and place it in a bin
directory under a created Hadoop
home directory. Set HADOOP_HOME = <<Hadoop home directory>>
in environment variable.
We will be using a pre-built Spark package, so choose a Spark pre-built package for Hadoop Spark download. Download and extract it.
Set SPARK_HOME
and add %SPARK_HOME%\bin
in PATH variable in environment variables.
Run command: spark-shell
Open http://localhost:4040/
in a browser to see the SparkContext web UI.
I found the easiest solution on Windows is to build from source.
You can pretty much follow this guide: http://spark.apache.org/docs/latest/building-spark.html
Download and install Maven, and set MAVEN_OPTS
to the value specified in the guide.
But if you're just playing around with Spark, and don't actually need it to run on Windows for any other reason that your own machine is running Windows, I'd strongly suggest you install Spark on a linux virtual machine. The simplest way to get started probably is to download the ready-made images made by Cloudera or Hortonworks, and either use the bundled version of Spark, or install your own from source or the compiled binaries you can get from the spark website.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With