Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run Hadoop on a Mesos cluster?

Tags:

hadoop

mesos

I am trying to set up a Apache Mesos Cluster and run Hadoop-Job on it. The documentation here is not at my level, so that i am not able to understand and maybe someone here can explain me:

First should I set up a working Hadoop Cluster? Or first set up a Mesos cluster? where do I point the slaves? in Hadoop-slaves file or registered Mesos slaves should only be used?

like image 393
likeaprogrammer Avatar asked Nov 12 '13 21:11

likeaprogrammer


2 Answers

The goal of Mesos is to run an abstraction for your cluster, where Hadoop would just be 1 service among others. In order for this to work, you need to first setup your Mesos cluster as the primary component, and then you can start adding services like Hadoop to this cluster using the Mesos abstraction.

There is an excellent tutorial from Mesosphere that you should take a look at, and it explains in details how to run Hadoop on top of Mesos, so this would be a good place to start.

Alternatively, this company recently started a serviced for Elastic Mesos, similar in nature to Amazon's Elastic MapReduce. So if you want to get started quickly with Hadoop on Mesos without having to go through the pain of configuring everything, this is a good place to start.

like image 170
Charles Menguy Avatar answered Oct 20 '22 06:10

Charles Menguy


Once you have a mesos cluster setup and running such that slaves show up in the Mesos WebUI, you can add hadoop to it by doing the following steps:

  1. First you have to setup HDFS. Cloudera's hadoop distribution is an easy way to do so. Just follow the instructions on this link for setting it up. This will automatically setup user accounts for you to run the MapReduce Jobs
  2. You just need to have a namenode running on your master and datanode running on your slaves. Navigating to localhost:50070 will show you that namenode is up and running and will also give a list of datanodes running. You dont need tasktrackers or jobtracker for now
  3. Next for integrating hadoop on mesos go to this github repository. Clone it on your PC and follow the instruction on the Readme
  4. By this time you will have a jobtracker running on your master. Navigate to localhost:50030 to see that jobtracker is running. You don't need tasktrackers on your slaves. Mesos will automatically start them for you by using the tar file you uploaded to the HDFS

Also you can consult the Mesosphere tutorial for any doubts you have. Just don't follow all the steps there as they are not for the latest Mesos versions.

like image 21
Aviral Agarwal Avatar answered Oct 20 '22 07:10

Aviral Agarwal