How to set up cluster environment for Spark applications on Windows machines?

Tags:

I have been developing in pyspark with spark standalone non-cluster mode. These days, I would like to explore more on the cluster mode of spark. I searched on the internet, and found I may need a cluster manager to run clusters in different machines using Apache Mesos or Spark Standalone. But, I couldn't easily find details of the picture.

How should I set up from system design point of view in order to run spark clusters in multiple windows machines (or multiple windows vms).

408

asked Jun 08 '17 13:06

Yohan Chung

1 Answers

You may want to explore (from the simplest) Spark Standalone, through Hadoop YARN to Apache Mesos or DC/OS. See Cluster Mode Overview.

I'd recommend using Spark Standalone first (as the easiest option to submit Spark applications to). Spark Standalone is included in any Spark installation and works fine on Windows. The issue is that there are no scripts to start and stop the standalone Master and Workers (aka slaves) for Windows OS. You simply have to "code" them yourself.

Use the following to start a standalone Master on Windows:

// terminal 1
bin\spark-class org.apache.spark.deploy.master.Master

Please note that after you start the standalone Master you get no input, but don't worry and head over to http://localhost:8080/ to see the web UI of the Spark Standalone cluster.

In a separate terminal start an instance of the standalone Worker.

// terminal 2
bin\spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077

With one-worker Spark Standalone cluster up, you should be able to submit Spark applications as follows:

spark-submit --master spark://localhost:7077 ...

Read Spark Standalone Mode in the official documentation of Spark.

As I just found out Mesos is not an option given its System Requirements:

Mesos runs on Linux (64 Bit) and Mac OS X (64 Bit).

You could however run any of the clusters using virtual machines using VirtualBox or similar. At least DC/OS has dcos-vagrant that should make it fairly easy:

dcos-vagrant Quickly provision a DC/OS cluster on a local machine for development, testing, or demonstration.

Deploying DC/OS Vagrant involves creating a local cluster of VirtualBox VMs using the dcos-vagrant-box base image and then installing DC/OS.

answered Sep 28 '22 08:09

Jacek Laskowski

Related questions
                            
                                What is the difference between a windows service and a regular application?
                            
                                Where to save ini file dependent to machine (not user) on windows
                            
                                How to write to I/O ports in Windows XP? (Delphi7)
                            
                                How to easily pass a very long string to a worker process under Windows?
                            
                                Execute data as code?
                            
                                Converting UTF-8 Characters to Upper/Lower case C++
                            
                                How to determine windows version from a VB script? [duplicate]
                            
                                Getting the port name of a connected USB device
                            
                                How to change MAC address with batch file on Windows 7? [closed]
                            
                                How can I get the current active window at the time a batch script is run?
                            
                                Only able to read one byte via serial
                            
                                How to combine multiple lines in a single text file into one line, in Windows?
                            
                                Detect if mouse button is down
                            
                                how to append text to a file in windows?
                            
                                what is the difference between pdo and fdo in windows device drivers?
                            
                                Publish Delphi EXE to Windows Store
                            
                                ImageMagick - Issue with Windows and convert function
                            
                                Vim: Split one file into multiple files based on row count
                            
                                How do I fix Error: spawn UNKNOWN with node.js v7.8.0 on windows 10?
                            
                                How to check whether port is open in Powershell

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to set up cluster environment for Spark applications on Windows machines?

Tags:

windows

apache-spark

mesos

apache-spark-standalone

Yohan Chung

People also ask

1 Answers

Jacek Laskowski

Recent Activity

Donate For Us