I was trying to run the example program in Hadoop given here
when i try the run it I get a org.apache.hadoop.mapred.FileAlreadyExistsException
emil@psycho-O:~/project/hadoop-0.20.2$ bin/hadoop jar jar_files/wordcount.jar org.myorg.WordCount jar_files/wordcount/input jar_files/wordcount/output
11/02/06 14:54:23 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/02/06 14:54:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/home/emil/project/hadoop-0.20.2/jar_files/wordcount/input already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:111)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:772)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.myorg.WordCount.main(WordCount.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
emil@psycho-O:~/project/hadoop-0.20.2$
Its from /home/emil/project/hadoop-0.20.2/jar_files/wordcount/input that I take my input files file01 and file02. When i googled i found out that this is done to prevent re-execution of same task. But in my case its the input file that is causing the exception. Is there anything wrong with my command because I don't see any posts with the same error for the wordcount problem. I am a newbie in java.
What could be the reason for this??
This is to prevent overwriting previous results. You can cleanup and delete the output path when creating and setting the job:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
TextInputFormat.addInputPath(job,new Path(args[0]));
FileSystem.get(conf).delete(new Path(args[1]),true);
TextOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
I faced the same problem. Took me a while to figure out whats going on. The main problem was you could not attach a debugger to find out what values being passed.
you are using the args[0] as input and args[1] as output folder in your code.
Now, if you are using the new framework where you are consuming the command lines inside the run method of Tool class, args[0] is the name of the program being executed which is WordCount in this case.
args[1] is the name of the input folder you are specifying which is mapped into the output folder by the program and hence you are seeing the exception.
So the solution is:
use args[1] and args[2].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With