Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion over hadoop job tracker api

Tags:

I'm try to collect some information from the job tracker. For starters I'd like to start with getting running jobs info such as job id or job name etc. But already stuck, here is what I've got (prints out job ids for currently running jobs):

public static void main(String[] args) throws IOException {
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum", "zk1.myhost,zk2.myhost,zk3.myhost");
        conf.set("hbase.zookeeper.property.clientPort", "2181");

        InetSocketAddress jobtracker = new InetSocketAddress("jobtracker.mapredhost.myhost", 8021);
        JobClient jobClient = new JobClient(jobtracker, conf);
        JobStatus[] jobs = jobClient.jobsToComplete();

        for (int i = 0; i < jobs.length; i++) {
            JobStatus js = jobs[i];
            if (js.getRunState() == JobStatus.RUNNING) {
                JobID jobId = js.getJobID();
                System.out.println(jobId);
            }
        }
    }

This above works as charm when trying to display job id, but now I want to display the job name as well. So I added this line after printing job id :

System.out.println(jobClient.getJob(jobId).getJobName());

I get this exception :

Exception in thread "main" java.lang.NullPointerException
    at org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:226)
    at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:1080)
    at org.apache.test.JobTracker.main(JobTracker.java:28)

jobClient is not null. I know this because I tried with null check if statement, but this jobClient.getJob(jobId) is null. What am I doing wrong here?

According to the API I should be ok,

http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapred/JobClient.html#getJob(org.apache.hadoop.mapred.JobID)

First get RunningJob from jobClient than once you have running job then get it's name http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapred/RunningJob.html#getJobName()

Anyone did something like this before? I could use jsoup to obtain this information trough GET request but I think this is better way to get this information.

Question update here is my hadoop/hbase dependencies :

<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>0.23.1-mr1-cdh4.0.0b2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>0.23.1-mr1-cdh4.0.0b2</version>
            <exclusions>
                <exclusion>
                    <groupId>org.mortbay.jetty</groupId>
                    <artifactId>jetty</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>javax.servlet</groupId>
                    <artifactId>servlet-api</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase</artifactId>
            <version>0.92.1-cdh4b2-SNAPSHOT</version>
        </dependency>

Bounty update :

Here are my imports :

import java.io.IOException;
import java.net.InetSocketAddress;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobID;
import org.apache.hadoop.mapred.JobStatus;

Here is the output of System.out.println(jobId) :

job_201207031810_1603

There is only one job currently running.

like image 346
Gandalf StormCrow Avatar asked Sep 04 '12 13:09

Gandalf StormCrow


1 Answers

Have a look at the inner class NetworkedJob of JobClient.
(source: /home/user/hadoop/src/mapred/org/apache/hadoop/mapred/JobClient.java)

Its constructor tries to fetch the Configuration object from JobClient in line 225 but it's null since new JobClient(InetSocketAddress jobTrackAddr, Configuration conf) doesn't set it:

// Set the completion poll interval from the configuration.
      // Default is 5 seconds.
      Configuration conf = JobClient.this.getConf();
      this.completionPollIntervalMillis = conf.getInt(COMPLETION_POLL_INTERVAL_KEY,
          DEFAULT_COMPLETION_POLL_INTERVAL); //NPE occurs here!

As a workaround, set it manually after creating the JobClient object. This will solve your problem:

..
JobClient jobClient = new JobClient(jobtracker, conf);
jobClient.setConf(conf); 
....

Sidenote:

I instantiated the Configuration object via:

Configuration conf = new Configuration();
conf.addResource(new Path("/path_to/core-site.xml"));
conf.addResource(new Path("/path_to/hdfs-site.xml"));
like image 68
Lorand Bendig Avatar answered Sep 19 '22 16:09

Lorand Bendig