Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Copying file from s3:// to local file system

I am an aws newbie. I created a cluster and ssh'ed into the master node. When I am trying to copy files from s3://my-bucket-name/ to local file://home/hadoop folder in pig using:

cp s3://my-bucket-name/path/to/file file://home/hadoop

i get the error:

2013-06-08 18:59:00,267 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 29 99: Unexpected internal error. AWS Access Key ID and Secret Access Key must be s pecified as the username or password (respectively) of a s3 URL, or by setting t he fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

I can not even ls into my s3 bucket. I set the AWS_ACCESS_KEY and AWS_SECRET_KEY without success. Also I could not locate config file for pig to set the appropriate fields.

Any help please?

Edit: I tried to load file in pig using the full s3n:// uri

grunt> raw_logs = LOAD 's3://XXXXX/input/access_log_1' USING TextLoader a
s (line:chararray);
grunt> illustrate raw_logs;

and I get the following error:

2013-06-08 19:28:33,342 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.HExecutionEngine - Connecting to hadoop file system at: file:/// 2013-06-08 19:28:33,404 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? fal se 2013-06-08 19:28:33,404 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2013-06-08 19:28:33,405 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2013-06-08 19:28:33,405 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2013-06-08 19:28:33,429 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percen t is not set, set to default 0.3 2013-06-08 19:28:33,430 [main] ERROR org.apache.pig.pen.ExampleGenerator - Error reading data. Internal error creating job configuration. java.lang.RuntimeException: Internal error creating job configuration. at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java :160) at org.apache.pig.PigServer.getExamples(PigServer.java:1244) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser. java:722) at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigS criptParser.java:591) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScript Parser.java:306) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j ava:189) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j ava:165) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:500) at org.apache.pig.Main.main(Main.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:187) 2013-06-08 19:28:33,432 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 29 97: Encountered IOException. Exception : Internal error creating job configurati on. Details at logfile: /home/hadoop/pig_1370719069857.log

like image 467
VJune Avatar asked Dec 15 '22 10:12

VJune


1 Answers

First off, you should use the s3n protocol (unless you stored the files on s3 using the s3 protocol) - s3 is used for block storage (i.e. similar to hdfs, only on s3) and s3n is for native s3 file system (i.e. you get what you see there).

You can use distcp or a simple pig load from s3n. You can either supply the access & secret in hadoop-site.xml as specified in the exception you got (see here for more info: http://wiki.apache.org/hadoop/AmazonS3), or you can add them to the uri:

raw_logs = LOAD 's3n://access:secret@XXXXX/input/access_log_1' USING TextLoader AS (line:chararray);

Make sure that your secret doesn't contain back-slashes - otherwise it won't work.

like image 56
SNeumann Avatar answered Jan 16 '23 22:01

SNeumann