Connection Error in Apache Pig

Question

I am running Apache Pig .11.1 with Hadoop 2.0.5.

Most simple jobs that I run in Pig work perfectly fine.

However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get these connection errors:

2013-07-29 13:24:08,591 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 
013-07-29 11:57:29,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:30,421 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

2013-07-29 11:57:31,422 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
...
2013-07-29 13:24:18,597 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2013-07-29 13:24:18,598 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:gpadmin (auth:SIMPLE) cause:java.io.IOException

The strange thing is that after these errors keeping appearing for about 2 minutes, they'll stop, and the correct output shows up at the bottom.

So Hadoop is running fine and computing the proper output. The problem is just these connection errors that keep popping up.

The LIMIT operator always gets this error. It happens on both MapReduce mode and local mode. The GROUP BY operator will work fine on small datasets.

One thing that I have noticed is that whenever this error appears, the job had created and ran multiple JAR files during the job. However, after a few minutes of these message popping up, the correct output finally appears.

Any suggestions on how to get rid of these messages?

Andy Botelho · Accepted Answer

Yes the problem was that the job history server was not running.

All we had to do to fix this problem was enter this command into the command prompt:

mr-jobhistory-daemon.sh start historyserver

This command starts up the job history server. Now if we enter 'jps', we can see that the JobHistoryServer is running and my Pig jobs no longer waste time trying to connect to the server.

sushilprj · Answer

I think, this problem is related to hadoop mapred-site configuration issue. History Server runs default in localhost, so you need to add your configured host.

<property>
 <name>mapreduce.jobhistory.address</name>
 <value>host:port</value>
</property>

then fire this command -

mr-jobhistory-daemon.sh start historyserver

prabhugs · Answer

I am using Hadoop 2.6.0, so I had to do

$ mr-jobhistory-daemon.sh --config /usr/local/hadoop/etc start historyserver

where, /usr/local/hadoop/etc is my HADOOP_CONF_DIR.

Connection Error in Apache Pig

Tags:

hadoop

apache-pig

Andy Botelho

3 Answers

Andy Botelho

sushilprj

prabhugs

Recent Activity

Donate For Us

Connection Error in Apache Pig

Tags:

hadoop

apache-pig

Andy Botelho

3 Answers

Andy Botelho

sushilprj

prabhugs

Related questions

Recent Activity

Donate For Us