I am trying to run example hadoop-streaming command:
hadoop-streaming -files streamingCode/wordSplitter.py \
-mapper wordSplitter.py \
-input s3://elasticmapreduce/samples/wordcount/input \
-output streamingCode/wordCountOut \
-reducer aggregate
but I keep getting this error:
Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Moved Permanently (Service: Amazon S3; Status Code: 301; Error Code: 301 Moved Permanently; Request ID: 98038E504E150CEC), S3 Extended Request ID: IW1x5otBSepAnPgW/RKELCUI9dhADQvrXqU2Ase1CLIa0SWDFnBbTscXihrvHvNm2ZRxjjSJZ1Q=
I think that it is because my cluster is in us-west-2
, but i can't figure out how to properly format the s3
url (or perhaps that is not the issue at all).
Edit: After changing it to the following url:
s3://s3-us-west-2.amazonaws.com/elasticmapreduce/samples/wordcount/input
I am now getting following error:
Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3
Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: BC8DB415C780DF84),
S3 Extended Request ID: sx8W/+gvND2ssqQce9ZQsZTiqxmSJYZs8OiXgrjwL3dm0JRPaC7ceHor+yrHsPuKTjM2LUwkRAw=
Edit: So I have confirmed that the error is indeed because my cluster is in us-west-2
, I have created a cluster in us-east-1
and it works properly. So, the question is how do I access a s3 bucket from another region? Is this even possible?
Amazon changed the default behavior starting emr-4.7.0 which caused this error when we upgraded EMR versions.
Solution is simple, add this configuration to core-site: fs.s3n.endpoint=s3.amazonaws.com
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With