So I need two files as an Input to my mapreduce program: City.dat and Country.dat
In my main method im parsing the command line arguments like this:
Path cityInputPath = new Path(args[0]);
Path countryInputPath = new Path(args[1]);
Path outputPath = new Path(args[2]);
MultipleInputs.addInputPath(job, countryInputPath, TextInputFormat.class, JoinCountryMapper.class);
MultipleInputs.addInputPath(job, cityInputPath, TextInputFormat.class, JoinCityMapper.class);
FileOutputFormat.setOutputPath(job, outputPath);
If I'm now running my programm with the following command:
hadoop jar capital.jar org.myorg.Capital /user/cloudera/capital/input/City.dat /user/cloudera/capital/input/Country.dat /user/cloudera/capital/output
I get the following error:
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /user/cloudera/capital/input/Country.dat already exists
Why does it treat this as my output directory? I specified another directory as the output directory. Can somebody explain this?
Based on the stacktrace, your output directory is not empty. So the simplest thing is actually to delete it before running the job:
bin/hadoop fs -rmr /user/cloudera/capital/output
Besides that, your arguments starting with the classname of your main class org.myorg.Capital
. So that is the argument on the zero'th index. (Based on the stacktrace and the code you have provided).
Basically you need to shift all your indices one to the right:
Path cityInputPath = new Path(args[1]);
Path countryInputPath = new Path(args[2]);
Path outputPath = new Path(args[3]);
MultipleInputs.addInputPath(job, countryInputPath, TextInputFormat.class, JoinCountryMapper.class);
MultipleInputs.addInputPath(job, cityInputPath, TextInputFormat.class, JoinCityMapper.class);
FileOutputFormat.setOutputPath(job, outputPath);
Don't forget to clear your output folder though!
Also a small tip for you, you can separate the files with comma "," so you can set them with a single call like this:
hadoop jar capital.jar org.myorg.Capital /user/cloudera/capital/input/City.dat,/user/cloudera/capital/input/Country.dat
And in your java code:
FileInputFormat.addInputPaths(job, args[1]);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With