Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the right way to use historyserver of hadoop 2.2?

I am using the hadoop hadoop-2.2.0. I can start historyserver in the master node and slave node?

  1. But I am not sure do i need start the history server on the slave node?

  2. If I start one history server on master, can i get all the logs of all jobs?

  3. If I need start all the servers in both master and slave node, is there any command to start all using one command? Not start each server one by one.

Any comments are welcome.

like image 949
Allen Avatar asked Feb 18 '14 02:02

Allen


People also ask

What is the use of job history server in Hadoop?

JobTracker or ResourceManager keeps all job information in memory. For finished jobs, it drops them to avoid running out of memory. Tracking of these past jobs are delegated to JobHistory server.


1 Answers

You need only one historyserver. It can run on any node you like, including a dedicated node of its own, but traditionally runs on the same node as the resourcemanager. The one history server is declared in mapred-site.xml:

  • mapreduce.jobhistory.address: MapReduce JobHistory Server host:port Default port is 10020.
  • mapreduce.jobhistory.webapp.address: MapReduce JobHistory Server Web UI host:port Default port is 19888.
  • mapreduce.jobhistory.intermediate-done-dir: Directory where history files are written by MapReduce jobs (in HDFS). Default is /mr-history/tmp
  • mapreduce.jobhistory.done-dir: Directory where history files are managed by the MR JobHistory Server (in HDFS). Default is /mr-history/done

You can access the history via the historyserver REST API, you do not access directly the internal history files. For casual browsing, the history is available in the resouremanager web UI.

like image 79
Remus Rusanu Avatar answered Nov 05 '22 12:11

Remus Rusanu