A quick guide on Salt-based install of Spark cluster

Question

I tried asking this on the official Salt user forum, but for some reason I did not get any assistance there. I am hoping I might get help here.

I am a new user of Salt. I am still evaluating the framework as a candidate for our SCM tool (as opposed to Ansible).

I went through the tutorial, and I am able to successfully manage master-minion/s relationship as covered in the first half of the tutorial.

Tutorial now forks into many different, intricate areas.

What I need is relatively straight forward, so I am hoping that perhaps someone can guide me here how to accomplish it.

I am looking to install Spark and HDFS on 20 RHEL 7 machines (lets say in ranges 168.192.10.0-20, 0 is a name node).

I see:

https://github.com/saltstack-formulas/hadoop-formula

and I found third-party Spark formula:

https://github.com/beauzeaux/spark-formula

Could someone be kind enough to suggest a set of instructions on how to go about this install in a most straightforward way?

helmbert · Accepted Answer

Disclaimer: This answer describes only the rough process of what you need to do. I've distilled it from the respective documentation chapters and added the sources for reference. I'm assuming that you are familiar with the basic workings of Salt (states and pillars and whatnot) and also with Hadoop (I'm not).

1. Configure GitFS

The typical way to install Salt formulas is using GitFS. See the respective chapter from the Salt manual for an in-depth documentation.

This needs to be done on your Salt master node.

Enable GitFS in the master configuration file (typically /etc/salt/master, or a separate file in /etc/salt/master.d):
```
fileserver_backend:
  - git
```

Add the two Salt formulas that you need as remotes (same file). This is also covered in the documentation:

gitfs_remotes:
  - https://github.com/saltstack-formulas/hadoop-formula.git
  - https://github.com/beauzeaux/spark-formula

(optional): Note the following warning from the Formula documentation:

We strongly recommend forking a formula repository into your own GitHub account to avoid unexpected changes to your infrastructure.

Many Salt Formulas are highly active repositories so pull new changes with care. Plus any additions you make to your fork can be easily sent back upstream with a quick pull request!

Fork the formulas into your own Git repository (using GitHub or otherwise) and use your private Git URL as remote in order to prevent unexpected changes to your configuration.
Restart Salt master.

2. Install Hadoop

This is documented in-depth in the Formulas README file. From a cursory reading, the formula can set up both Hadoop masters and slaves; the role is determined using a Salt grain.

Configure the Hadoop role in the file /etc/salt/grains. This needs to be done on each Salt minion node (use hadoop_master and hadoop_slave appropriately):
```
roles:
  - hadoop_master
```
Configure the Salt mine on your Salt minion (typically /etc/salt/minion or a separate file in /etc/salt/minion.d):
```
mine_functions:
  network.interfaces: []
  network.ip_addrs: []
  grains.items: []
```
Have a look at additional configuration grains and set them as you see fit.
Add the required pillar data for configuring your Hadoop set up. We're back on the Salt master node for this (for this, I'm assuming you are familiar with states and pillars; see the manual or this walkthrough otherwise). Have a look at the example pillar for possible configuration options.

Use the hadoop and hadoop.hdfs states in your top.sls:

'your-hadoop-hostname*':
  - hadoop
  - hadoop.hdfs

3. Install Spark

According to the Formula's README, there's nothing to configure via grains or pillars, so all that's left is to use the spark state in your top.sls:
```
'your-hadoop-hostname*':
  - hadoop
  - hadoop.hdfs
  - spark
```

4. Fire!

Apply all states:

salt 'your-hadoop-hostname*' state.highstate

A quick guide on Salt-based install of Spark cluster

Tags:

apache-spark

hdfs

salt-stack

Edmon

1 Answers

1. Configure GitFS

2. Install Hadoop

3. Install Spark

4. Fire!

helmbert

Recent Activity

Donate For Us

A quick guide on Salt-based install of Spark cluster

Tags:

apache-spark

hdfs

salt-stack

Edmon

1 Answers

1. Configure GitFS

2. Install Hadoop

3. Install Spark

4. Fire!

helmbert

Related questions

Recent Activity

Donate For Us