Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pig keeps trying to connect to job history server (and fails)

I'm running a Pig job that fails to connect to the Hadoop job history server.

The task (usually any task with GROUP BY) runs for a while and then it starts with a message like:

2015-04-21 19:05:22,825 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2015-04-21 19:05:26,721 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-04-21 19:05:29,721 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

It then continues for a while retrying the connection. Sometimes it precedes further with the job. Othertimes it throws this exception:

2015-04-21 19:05:55,822 [main] WARN  org.apache.pig.tools.pigstats.mapreduce.MRJobStats - Unable to get job counters
java.io.IOException: java.io.IOException: java.net.NoRouteToHostException: No Route to Host from  cluster-01/10.10.10.11 to 0.0.0.0:10020 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
    at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getCounters(HadoopShims.java:132)
    at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addCounters(MRJobStats.java:284)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:235)
    at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:280)

I found this question here but in my case the job history server is started. If I run netstat, I find:

tcp        0      0 0.0.0.0:10020           0.0.0.0:*               LISTEN      12073/java       off (0.00/0/0)

Where 12073 is ...

12073 pts/4    Sl     0:07 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Dproc_historyserver -Xmx1000m -Djava.library.path=/data/hadoop/hadoop/lib -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/data/hadoop/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/data/hadoop/hadoop/logs -Dhadoop.log.file=mapred-hadoop-historyserver-cluster-01.log -Dhadoop.root.logger=INFO,RFA -Dmapred.jobsummary.logger=INFO,JSA -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer

I tried opening the port 10200 in case it was a firewall issue:

ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:10020

... but no luck.

After a few minutes, some of the tasks just arbitrarily continue to the next part.

I'm using Hadoop 2.3 and Pig 0.14.

My question is:

1) What are the possible reasons why Pig cannot connect to the job history server (JHS) given that the JHS is running on the same port that Pig looks for it?

... or failing that ...

2) Is there any way to just tell Pig to stop trying to connect to the JHS and continue with the task?

like image 994
badroit Avatar asked Apr 21 '15 22:04

badroit


2 Answers

It seems that most Hadoop installation/configuration guides neglect to mention configuring the Job History Server. It seems that Pig, in particular, relies on this server. It also seems like the default (local) settings for the JHS won't work in a multi-node cluster.

The solution was to add the hostname of the server into the configuration in mapred-site.xml to make sure it could be accesses from the other machines. (In my version of the file, the lines had to be added as "new" ... there were no previous settings.)

<property>
  <name>mapreduce.jobhistory.address</name>
  <value>cm:10020</value>
  <description>Host and port for Job History Server (default 0.0.0.0:10020)</description>
</property>

Then restart the job history server:

mr-jobhistory-daemon.sh stop historyserver
mr-jobhistory-daemon.sh start historyserver

If you get a bind exception (port in use), it means the stop didn't work. Either

  1. Use ps ax | grep -e JobHistory to get the process and kill it manually with kill -9 [pid]. Then call the start command above again. Or

  2. Use a different port in the configuration

Pig should pick up the new settings automatically. Run a Pig script and hope for the best.

like image 171
badroit Avatar answered Oct 16 '22 16:10

badroit


start history server in hadoop bin using the below command

bin$ ./mr-jobhistory-daemon.sh start historyserver

run pig using the below command

$pig
like image 24
y durga prasad Avatar answered Oct 16 '22 18:10

y durga prasad