I am using ubuntu 12.02 32bit and have installed hadoop2.2.0 and pig 0.12 successfully. Hadoop runs properly on my system.
However, whenever I run this command :
data = load 'atoz.csv' using PigStorage(',') as (aa1:int, bb1:int, cc1:int, dd1:chararray);
dump data;
I'm getting the following error :
ERROR org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl - Error whiletrying to run jobs.java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected.
Here is the full stacktrace :
> 2014-01-23 10:41:44,998 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher- 1 map-reduce job(s) waiting for submission.
> 2014-01-23 10:41:45,000 [Thread-9] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
> Metrics with processName=JobTracker, sessionId= - already initialized
> 2014-01-23 10:41:45,001 [Thread-9] ERROR org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl - Error while
> trying to run jobs.
> java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
> at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
> at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
> at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
> at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
> at java.lang.Thread.run(Thread.java:724)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
> 2014-01-23 10:41:45,498 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2014-01-23 10:41:45,502 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job null has failed! Stop running all dependent jobs
> 2014-01-23 10:41:45,503 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2014-01-23 10:41:45,507 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
> recreate exception from backend error: Unexpected System Error
> Occured: java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
> at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
> at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
> at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
> at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
> at java.lang.Thread.run(Thread.java:724)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
2014-01-23 10:41:45,507 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
> failed!
> 2014-01-23 10:41:45,507 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Detected Local mode.
> Stats reported below may be incomplete
> 2014-01-23 10:41:45,508 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
> 2.2.0 0.10.1 hardik 2014-01-23 10:41:44 2014-01-23 10:41:45 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
N/A aatoz MAP_ONLY Message: Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:225)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:186)
> at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:456)
> at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
> at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
> at org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:240)
> at java.lang.Thread.run(Thread.java:724)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:260)
> file:/tmp/temp1979716161/tmp-189979005,
Input(s):
Failed to read data from "file:///home/hardik/pig10/bin/input/atoz.csv"
Output(s):
Failed to produce result in "file:/tmp/temp1979716161/tmp-189979005"
Job DAG:
null
2014-01-23 10:41:45,509 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
Failed! 2014-01-23 10:41:45,510 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator
> for alias aatoz
> Details at logfile: /home/hardik/pig10/bin/pig_1390453192689.log
> </i>
Now load the data from the file student_data. txt into Pig by executing the following Pig Latin statement in the Grunt shell. grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',') as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
You can use custom JAR files with Pig using the REGISTER command in your Pig script.
Class PigStorage. A load function that parses a line of input into fields using a character delimiter. The default delimiter is a tab.
Pig Represents Big Data as data flows. Pig is a high-level platform or tool which is used to process the large datasets. It provides a high-level of abstraction for processing over the MapReduce. It provides a high-level scripting language, known as Pig Latin which is used to develop the data analysis codes.
just building with command "ant clean jar-all -Dhadoopversion=23" is not enough if you are using maven dependencies in your project. You will need to install the jar created by this in your local maven repo or use this dependency (notice "classifier" tag for hadoop2) in your pom.xml
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<classifier>h2</classifier>
<version>0.13.0</version>
</dependency>
Apache Pig 0.12.0 expects an older version of Hadoop by default. You must recompile Pig for Hadoop 2.2.0 and replace two jars with new pig-0.12.1-SNAPSHOT.jar and pig-0.12.1-SNAPSHOT-withouthadoop.jar.
For recompilation unpack the pig archive, go to the directory "pig-0.12.0" and just run:
ant clean jar-all -Dhadoopversion=23
I've resolved it in another way.
I got the same problem on CDH4.4 & Pig 0.11.0 when my Pig script was invoking a UDF from my java project which was compiled using Maven.
I've visited /usr/lib/pig/conf/build.properties file. Verified the versions mentioned against hadoop-core, hadoop-common, hadoop-mapreduce properties. Ensured that all these artifacts with same versions are included as dependencies in my java project's POM.xml file.
( Infact hadoop-mapreduce has 6 artifact Ids as per http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_31.html. I've included all of them in the dependencies list).
After building my project's jar file with these POM settings, Pig script was able to invoke UDF without any problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With