Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For a single CDH (Hadoop) cluster installation, which host should I use?

I started with a Windows 7 computer, and set up an Ubuntu Linux virtual machine which I run using VirtualBox. The Cloudera Manager Free Edition version 4 has been executed, and I have been following the prompts on localhost:7180.

I am now stuck when the prompt asks me to "Specify hosts for your CDH cluster installation." Can I install all of the Hadoop components, as well as run them, in the linux virtual machine alone?

Please help point me in the right direction in which host I should specify.

like image 357
Marina Avatar asked Nov 12 '22 18:11

Marina


1 Answers

Yes, you can run cdh in a linux virtual machine alone. You could do it using "standalone" or "pseudo distributed" modes. IMHO, the most effective method for doing it is to use the "pseudo distributed" mode.

In this case, there are multiple java-virtual-machines (JVM) running, so they simulated as they were a cluster with multiples nodes (each thread simulated to be a cluster node).

Cloudera has documented how to get deployed as "pseudo distributed":

https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cdh_qs_cdh5_pseudo.html

Note: 3 ways for deploying cdh:

  1. standalone: using a machine alone, with a unique jvm
  2. pseudo-distributed: using a machine alone, but several jvm's, so simulated to be a cluster
  3. distributed: using a cluster, so several nodes with different purposes (workers, namenode, etc).
like image 94
evinhas Avatar answered Nov 15 '22 07:11

evinhas