How do I install Hadoop and Pydoop on a fresh Ubuntu instance

Question

Most of the setup instructions I see are verbose. Is there a near script-like set of commands that we can just execute to set up Hadoop and Pydoop on an Ubuntu instance on Amazon EC2?

Samuel Cozannet · Accepted Answer

Another solution would be to use Juju (Ubuntu's service orchestration framework).

First install the Juju client on your standard computer:

sudo add-apt-repository ppa:juju/stable
sudo apt-get update && sudo apt-get install juju-core

(instructions for MacOS and Windows are also available here)

Then generate a configuration file

juju generate-config

And modify it with your preferred cloud credentials (AWS, Azure, GCE...). Based on the naming for m3.medium, I assume you use AWS hence follow these instructions

Note: The above has to be done only once.

Now bootstrap

 juju bootstrap amazon

Deploy a GUI (optional) like the demo available on the website

juju deploy --to 0 juju-gui && juju expose juju-gui

You'll find the URL of the GUI and password with:

juju api-endpoints | cut -f1 -d":"
cat ~/.juju/environments/amazon.jenv | grep pass

Note that the above steps are preliminary to any Juju deployment, and can be re-used everytime you want to spin the environment.

Now comes your use case with Hadoop. You have several options.

Just deploy 1 node of Hadoop

juju deploy --constraints "cpu-cores=2 mem=4G root-disk=20G" hadoop

You can track the deployment with

juju debug-log

and get info about the new instances with

juju status

This is the only command you'll need to deploy Hadoop (you could consider Juju as an evolution of apt for complex systems)

Deploy a cluster of 3 nodes with HDFS and MapReduce

juju deploy hadoop hadoop-master
juju deploy hadoop hadoop-slavecluster
juju add-unit -n 2 hadoop-slavecluster
juju add-relation hadoop-master:namenode hadoop-slavecluster:datanode
juju add-relation hadoop-master:resourcemanager hadoop-slavecluster:nodemanager

Scale out usage (separate HDFS & MapReduce, experimental)

juju deploy hadoop hdfs-namenode
juju deploy hadoop hdfs-datacluster
juju add-unit -n 2 hdfs-datacluster
juju add-relation hdfs-namenode:namenode hdfs-datacluster:datanode
juju deploy hadoop mapred-resourcemanager
juju deploy hadoop mapred-taskcluster
juju add-unit -n 2 mapred-taskcluster
juju add-relation mapred-resourcemanager:mapred-namenode hdfs-namenode:namenode
juju add-relation mapred-taskcluster:mapred-namenode hdfs-namenode:namenode
juju add-relation mapred-resourcemanager:resourcemanager mapred-taskcluster:nodemanager

For Pydoop, you'll have to deploy it manually as in the first answer (you have access to the Juju instances via "juju ssh "), or you can write a "charm" (a method for Juju to learn how to deploy pydoop).

How do I install Hadoop and Pydoop on a fresh Ubuntu instance

Tags:

python

amazon-web-services

ubuntu

hadoop

S Anand

1 Answers

Samuel Cozannet

Recent Activity

Donate For Us

How do I install Hadoop and Pydoop on a fresh Ubuntu instance

Tags:

python

amazon-web-services

ubuntu

hadoop

S Anand

1 Answers

Samuel Cozannet

Related questions

Recent Activity

Donate For Us