Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use S3DistCp in java code

I want to copy output of job from EMR cluster to Amazon S3 pro-grammatically.

How to use S3DistCp in java code to do the same.

like image 934
user2664210 Avatar asked May 02 '26 19:05

user2664210


1 Answers

hadoop ToolRunner can run this.. since S3DistCP extends Tool

Below is the usage example:

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.util.ToolRunner;
import com.amazon.external.elasticmapreduce.s3distcp.S3DistCp

public class CustomS3DistCP{
  private static final Log log = LogFactory.getLog(CustomS3DistCP.class);

public static void main(String[] args) throws Exception {
     log.info("Running with args: " + args);

     System.exit(ToolRunner.run(new S3DistCp(), args));
   }

you have to have s3distcp jar in your classpath You can call this program from a shell script.

Hope that helps!

like image 131
Ram Ghadiyaram Avatar answered May 05 '26 09:05

Ram Ghadiyaram