Originally posted this to ServerFault, but posting here in the hopes that someone might have run into my issue.
I'm trying to set up a container to run on AWS Batch. I'm not doing anything fancy, more or less just following the default set-up with everything. I'm getting an error that seems to be related to the instance role or the permissions associated with the instance role.
The set-up goes without a hitch at first. I set up my compute environment, then my queue, then I add a basic job to the queue. The job ends up getting stuck in the runnable state, and then after 20 minutes or so, my compute environment becomes "INVALID" with this error:
CLIENT_ERROR - Invalid IamInstanceProfile: arn:aws:iam::001234567890:role/ecsInstanceRole (Service: AmazonAutoScaling; Status Code: 400; Error Code: ValidationError; Request ID: blah)
I read this troubleshooting guide, which seems to tackle related problems (though they aren't quite exact matches). I've tried recreating the environment 5 or 6 times with no luck. I've also tried deleting my existing roles and letting the manager recreate them. Most of the problems in the troubleshooting guide seem to stem from roles that were incorrectly set up in the AWS CLI or via some non-Batch console needs. The guide even reads "the AWS Batch console only displays roles that have the correct trust relationship for compute environments". But all of the roles I've used I've selected via the console, which would seem to imply that they're correctly permissioned.
Not sure what to do here, grateful for any help.
Somewhat confusingly, the instanceRole
property of AWS Batch Compute Environment must reference IAM instance profile ARN rather than IAM role ARN. That is, the instanceRole value should look like arn:aws:iam::123456789012:instance-profile/ecsInstanceRole
rather than arn:aws:iam::123456789012:role/ecsInstanceRole
. The error message actually mentions instance profiles, though.
The following CloudFormation snippet creates a valid Batch compute environment:
Parameters:
VPC:
Type: String
Description: VPC ID of the target VPC
Subnet:
Type: List<AWS::EC2::Subnet::Id>
Description: VPC subnet(s) for batch instances
SG:
Type: List<AWS::EC2::SecurityGroup::Id>
Description: VPC Security group ID(s) for batch instances
Resources:
MyBatchEnvironment:
Type: "AWS::Batch::ComputeEnvironment"
Properties:
Type: MANAGED
ServiceRole: !GetAtt MyBatchEnvironmentRole.Arn
ComputeResources:
MaxvCpus: 8
SecurityGroupIds: !Ref SG
Subnets: !Ref Subnet
InstanceRole: !GetAtt MyBatchInstanceProfile.Arn
MinvCpus: 0
DesiredvCpus: 0
Type: EC2
InstanceTypes:
- optimal
MyBatchEnvironmentRole:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: {Service: "batch.amazonaws.com"}
Action: "sts:AssumeRole"
Path: /service-role/
ManagedPolicyArns:
- "arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole"
MyBatchInstanceRole:
Type: "AWS::IAM::Role"
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: {Service: "ec2.amazonaws.com"}
Action: "sts:AssumeRole"
Path: /
ManagedPolicyArns:
- "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
MyBatchInstanceProfile:
Type: "AWS::IAM::InstanceProfile"
Properties:
Path: "/"
Roles:
- !Ref MyBatchInstanceRole
Thank you for bringing this to our attention. We have resolved the root cause of this issue and the console should now work as expected. Please give this another try and let us know if you encounter any further errors.
Jamie from the AWS Batch team
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With