Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HADOOP / YARN - Are the ResourceManager and the hdfs NameNode always installed on the same host?

Are the “resource manager” and the “hdfs namenode” always installed on the same host?

1) When I want to send an http request (YARN REST API) to get new application id I am using this web uri:

http://<rm http address:port>/ws/v1/cluster/apps/new-application

This port is the Resource-Manager Web UI http port, it’s default value is 8088 as shown in img1: source for img1Yarn Ports

2) When I want to send an http request (WebHDFS REST API) for hdfs commands, for example to get file status I am using this web uri:

http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS

This <PORT> is the NameNode WebUI http Port, it’s default value is 50070 as shown in img2: source for img2 HDFS Ports

Are the hosts of those components (ResourceManager and NaneNode) are always installed on the same host?

Any help would be appreciated, Thanks!

like image 377
Xquery Avatar asked Mar 30 '15 12:03

Xquery


People also ask

How do HDFS and YARN work together?

YARN allows the data stored in HDFS (Hadoop Distributed File System) to be processed and run by various data processing engines such as batch processing, stream processing, interactive processing, graph processing and many more. Thus the efficiency of the system is increased with the use of YARN.

What is difference between HDFS and YARN?

YARN is a generic job scheduling framework and HDFS is a storage framework. YARN in a nut shell has a master(Resource Manager) and workers(Node manager), The resource manager creates containers on workers to execute MapReduce jobs, spark jobs etc.

What is YARN ResourceManager?

As previously described, ResourceManager (RM) is the master that arbitrates all the available cluster resources and thus helps manage the distributed applications running on the YARN system.

Can we run the NameNode and Datanode on the same machine?

Answer. Yes, you can have a DataNode on the same machine as the NameNode.


1 Answers

It is not necessary to run resourceManager and nameNode on a single machines, we have such configuration when working in a single node/small cluster---with few nodes---configuration. When there is large cluster, master nodes usually run resourceManager, nameNode and secondaryNamenode on different machines.

cluster topology

Have a look at these links:

Master Nodes in Hadoop Clusters

HortonWorks: Typical Hadoop Cluster

like image 119
Hamza Zafar Avatar answered Oct 08 '22 05:10

Hamza Zafar