After creating the Amazon S3 Bucket, my_bucket
, I created an Elastic Map Reduce cluster via the cli:
aws emr create-cluster --name "Hive testing" --ami-version 3.3 --applications Name=Hive --use-default-roles --instance-type m3.xlarge --instance-count 3 --steps Type=Hive,Name="Hive Program",Args=[-d,INPUT=s3://my_bucket/input,-d.OUTPUT=s3://my_bucket/input,-d-LIBS=s3://my_bucket/serde_libs]
Note that I did not specify a hive
*.q file. After making the S3 and EMR Cluster, I will log onto the EMR box, and then run hive
interactively.
Note- I'm assuming there's an EMR box onto which I can log.
However, when I ran aws emr describe-cluster --cluster-id XYZ
, I saw this error in the output:
"State": "TERMINATED_WITH_ERRORS",
"StateChangeReason": {
"Message": "EMR service role arn:aws:iam::xyz:role/EMR_DefaultRole
is invalid",
"Code": "VALIDATION_ERROR"
}
What would cause this error? Do I need to open permissions on the S3 bucket for the EMR cluster to access it?
The issue is not with the bucket but that the expected IAM role is missing.
See http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles-creatingroles.html#emr-iam-roles-createdefaultwithcli
Issue the AWS CLI command:
aws emr create-default-roles
Then create the cluster again. This is a one-time step needed to create the default roles.
note: beware of using a recent version of aws cli, I had problems with 1.4 (debian jessie package)
note 2: taken from mrjob code and amazon annoucments:
instance profile and service role are required for accounts created after April 6, 2015, and will eventually be required for all accounts
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With