Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between job, application, task, task attempt logs in Hadoop, Oozie

Tags:

hadoop

oozie

I'm running an Oozie job with multiple actions and there's a part I could not make it work. In the process of troubleshooting I'm overwhelmed with lots of logs.

In YARN UI (yarn.resourceman­ager.webapp.address in yarn-site.xml, normally on port 8088), there's the application_<app_id> logs.

In Job History Server (yarn.log.server.url in yarn-site.xml, ours on port 19888), there's the job_<job_id> logs. (These job logs should also show up on Hue's Job Browser, right?)

In Hue's Oozie workflow editor, there's the task and task_attempt (not sure if they're the same, everything's a mixed-up soup to me already), which redirects to the Job Browser if you clicked here and there.

Can someone explain what's the difference between these things from Hadoop/Oozie architectural standpoint?

P.S. I've seen in logs container_<container_id> as well. Might as well include this in your explanation in relation to the things above.

like image 934
oikonomiyaki Avatar asked Feb 02 '16 06:02

oikonomiyaki


People also ask

What is task attempt in terms of MapReduce?

MapReduce job consists of several tasks (they could be either map or reduce tasks). If a task fails, it is launched again on another node. Those are task attempts.

What is the relationship between job and task in Hadoop?

In Hadoop, Job is divided into multiple small parts known as Task. In Hadoop, “MapReduce Job” splits the input dataset into independent chunks which are processed by the “Map Tasks” in a completely parallel manner. Hadoop framework sorts the output of the map, which are then input to the reduce tasks.


1 Answers

In terms of YARN, the programs that are being run on a cluster are called applications. In terms of MapReduce they are called jobs. So, if you are running MapReduce on YARN, job and application are the same thing (if you take a close look, job ids and application ids are the same).

MapReduce job consists of several tasks (they could be either map or reduce tasks). If a task fails, it is launched again on another node. Those are task attempts.

Container is a YARN term. This is a unit of resource allocation. For example, MapReduce task would be run in a single container.

like image 151
facha Avatar answered Oct 16 '22 09:10

facha