Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop : java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text

Tags:

My program looks like

public class TopKRecord extends Configured implements Tool {      public static class MapClass extends Mapper<Text, Text, Text, Text> {          public void map(Text key, Text value, Context context) throws IOException, InterruptedException {             // your map code goes here             String[] fields = value.toString().split(",");             String year = fields[1];             String claims = fields[8];              if (claims.length() > 0 && (!claims.startsWith("\""))) {                 context.write(new Text(year.toString()), new Text(claims.toString()));             }         }     }    public int run(String args[]) throws Exception {         Job job = new Job();         job.setJarByClass(TopKRecord.class);          job.setMapperClass(MapClass.class);          FileInputFormat.setInputPaths(job, new Path(args[0]));         FileOutputFormat.setOutputPath(job, new Path(args[1]));          job.setJobName("TopKRecord");         job.setMapOutputValueClass(Text.class);         job.setNumReduceTasks(0);         boolean success = job.waitForCompletion(true);         return success ? 0 : 1;     }      public static void main(String args[]) throws Exception {         int ret = ToolRunner.run(new TopKRecord(), args);         System.exit(ret);     } } 

The data looks like

"PATENT","GYEAR","GDATE","APPYEAR","COUNTRY","POSTATE","ASSIGNEE","ASSCODE","CLAIMS","NCLASS","CAT","SUBCAT","CMADE","CRECEIVE","RATIOCIT","GENERAL","ORIGINAL","FWDAPLAG","BCKGTLAG","SELFCTUB","SELFCTLB","SECDUPBD","SECDLWBD" 3070801,1963,1096,,"BE","",,1,,269,6,69,,1,,0,,,,,,, 3070802,1963,1096,,"US","TX",,1,,2,6,63,,0,,,,,,,,, 3070803,1963,1096,,"US","IL",,1,,2,6,63,,9,,0.3704,,,,,,, 3070804,1963,1096,,"US","OH",,1,,2,6,63,,3,,0.6667,,,,,,, 

On running this program I see the following on console

12/08/02 12:43:34 INFO mapred.JobClient: Task Id : attempt_201208021025_0007_m_000000_0, Status : FAILED java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.Text     at com.hadoop.programs.TopKRecord$MapClass.map(TopKRecord.java:26)     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)     at java.security.AccessController.doPrivileged(Native Method)     at javax.security.auth.Subject.doAs(Subject.java:396)     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)     at org.apache.hadoop.mapred.Child.main(Child.java:249) 

I believe that the Class Types are mapped correctly, Class Mapper,

Please let me know what is that I am doing wrong here?

like image 294
daydreamer Avatar asked Aug 02 '12 19:08

daydreamer


1 Answers

When you read a file with a M/R program, the input key of your mapper should be the index of the line in the file, while the input value will be the full line.

So here what's happening is that you're trying to have the line index as a Text object which is wrong, and you need an LongWritable instead so that Hadoop doesn't complain about type.

Try this instead:

public class TopKRecord extends Configured implements Tool {      public static class MapClass extends Mapper<LongWritable, Text, Text, Text> {          public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {             // your map code goes here             String[] fields = value.toString().split(",");             String year = fields[1];             String claims = fields[8];              if (claims.length() > 0 && (!claims.startsWith("\""))) {                 context.write(new Text(year.toString()), new Text(claims.toString()));             }         }     }      ... } 

Also one thing in your code that you might want to reconsider, you're creating 2 Text objects for every record you're processing. You should only create these 2 objects right at the beginning, and then in your mapper just set their values by using the set method. This will save you a lot of time if you're processing a decent amount of data.

like image 95
Charles Menguy Avatar answered Oct 25 '22 15:10

Charles Menguy