Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop Ports Clarification

I am learning hadoop and bit confused about the default ports and the locations.

When I hit the URL: localhost:50070 gives a result for the hdfs info. In hadoop docs following are some of the ports mentioned.

hdfs-default.xml

dfs.datanode.http.address   0.0.0.0:50075 
dfs.datanode.address     0.0.0.0:50010
dfs.namenode.http-address    0.0.0.0:50070
dfs.namenode.backup.http-address    0.0.0.0:50105

mapred-default.xml

mapreduce.jobtracker.http.address   0.0.0.0:50030
mapreduce.tasktracker.http.address  0.0.0.0:50060

yarn-default.xml

yarn.resourcemanager.address     ${yarn.resourcemanager.hostname}:8032
yarn.resourcemanager.webapp.address  ${yarn.resourcemanager.hostname}:8088

Now while configuring Hadoop 2 in my machine I did : $ cd /usr/local/hadoop/etc/hadoop $ vi core-site.xml

<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
</property>

Question: There are so many ports mentioned in the default and other xml in the docs ....

1) localhost:50070 only returns some meaning full data (hdfs health) what about the other ports. Others just dont return any information ?

2) in yarn-default.xml both are resource manager ports difference is one is webapp port. only the when I hit localhost:8088 in browser it gives the cluster(single node in this case ) information. Then what is the port 8083? In a sample code I see 8083 is the RM port. Not clear to me. Can someone please explain

3)I changed the hdfs port to 9000 is that standard?

4)How to see the appmaster, jobtracker, tasktracker ports

5) I thought in yarn hadoop 2 there is no jobtracker and tasktracker then what are the purpose of these ports ?

I am having nightmare with these basic questions...

Thanks, Amit

like image 918
Dutta Avatar asked Nov 12 '13 19:11

Dutta


1 Answers

Hadoop provides Web UI's to have a peek into the hadoop cluster. They help in understanding the status of cluster, job details(running, failed), etc through browser. This is a great relief as we don't want to remember all the commands for these and try from terminal. You have already pointed out some of important ports needed for these(those are default ports and you can change those by playing in configuration files).

Now I will answer your questions one by one. I assume hadoop is in pseudo mode viewing at the core-site.xml.

1) localhost:50070 only returns some meaning full data (hdfs health) what about the other ports. Others just don't return any information ?

I will explain it with details provided by you to avoid confusion.

The rest of the ports are also used for connecting from browser like localhost:50075 for viewing datanode details, localhost:8088 for viewing the currently running jobs, completed ones and so on. Properties which do not have details like http-address, webapp.address are used for Inter Process Communication(IPC). Examples for those ports are 8032, 50010 etc.

2) in yarn-default.xml both are resource manager ports difference is one is webapp port. only the when I hit localhost:8088 in browser it gives the cluster(single node in this case ) information. Then what is the port 8083? In a sample code I see 8083 is the RM port. Not clear to me. Can someone please explain

I hope I have cleared this doubt in the above answer.

3)I changed the hdfs port to 9000 is that standard?

The default port number is 8020. You can keep any. But I don't know whether setting it to 9000 is a standard. I have seen it in some of vendor provided hadoop other than apache.

4)How to see the appmaster, jobtracker, tasktracker ports

I actually couldn't understand your question. If the one you intend to ask was about webui, we have already covered it in the answer of question 1.

5) I thought in yarn hadoop 2 there is no jobtracker and tasktracker then what are the purpose of these ports ?

As of my understanding YARN is a layer that came in between mapreduce and hadoop for the better management of resources and jobs. So it means that jobtracker and tasktracker processes are still present(in background) and used by resource manager and nodemanager processes when required.

Someone can correct me if I went wrong somewhere.

Thanks and regards, Bibin

like image 161
MJBibin Avatar answered Nov 16 '22 02:11

MJBibin