Precisely following the step-by-step instructions on this page I am trying to export contents of one of my DynamoDB tables to an S3 bucket. I create a pipeline exactly as instructed but it fails to run. It seems that it has trouble identifying/running an EC2 resource to do the export. When I access EMR through AWS Console, I see entries like this:
Cluster: df-0..._@EmrClusterForBackup_2015-03-06T00:33:04Terminated with errorsEMR service role arn:aws:iam::...:role/DataPipelineDefaultRole is invalid
Why am I getting this message? Do I need to set up/configure something else for the pipeline to run?
UPDATE: UnderIAM->Roles
in AWS console I am seeing this for DataPipelineDefaultResourceRole
:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"s3:List*",
"s3:Put*",
"s3:Get*",
"s3:DeleteObject",
"dynamodb:DescribeTable",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:GetItem",
"dynamodb:BatchGetItem",
"dynamodb:UpdateTable",
"rds:DescribeDBInstances",
"rds:DescribeDBSecurityGroups",
"redshift:DescribeClusters",
"redshift:DescribeClusterSecurityGroups",
"cloudwatch:PutMetricData",
"datapipeline:PollForTask",
"datapipeline:ReportTaskProgress",
"datapipeline:SetTaskStatus",
"datapipeline:PollForTask",
"datapipeline:ReportTaskRunnerHeartbeat"
],
"Resource": ["*"]
}]
}
And this for DataPipelineDefaultRole
:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"s3:List*",
"s3:Put*",
"s3:Get*",
"s3:DeleteObject",
"dynamodb:DescribeTable",
"dynamodb:Scan",
"dynamodb:Query",
"dynamodb:GetItem",
"dynamodb:BatchGetItem",
"dynamodb:UpdateTable",
"ec2:DescribeInstances",
"ec2:DescribeSecurityGroups",
"ec2:RunInstances",
"ec2:CreateTags",
"ec2:StartInstances",
"ec2:StopInstances",
"ec2:TerminateInstances",
"elasticmapreduce:*",
"rds:DescribeDBInstances",
"rds:DescribeDBSecurityGroups",
"redshift:DescribeClusters",
"redshift:DescribeClusterSecurityGroups",
"sns:GetTopicAttributes",
"sns:ListTopics",
"sns:Publish",
"sns:Subscribe",
"sns:Unsubscribe",
"iam:PassRole",
"iam:ListRolePolicies",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListInstanceProfiles",
"cloudwatch:*",
"datapipeline:DescribeObjects",
"datapipeline:EvaluateExpression"
],
"Resource": ["*"]
}]
}
Do these need to be modified somehow?
I ran into the same error.
In IAM, attach the AWSDataPipelineRole
managed policy to DataPipelineDefaultRole
I also had to update the Trust Relationship to the following (needed ec2 which is not in the documentation):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": [
"ec2.amazonaws.com",
"elasticmapreduce.amazonaws.com",
"datapipeline.amazonaws.com"
]
},
"Action": "sts:AssumeRole"
}
]
}
There is a similar question in AWS forum and it seems it is related to an issue with managed policies
https://forums.aws.amazon.com/message.jspa?messageID=606756
In that question, they recommend using specific inline policies for both access and trust policies to define those roles changing some permissions. Oddly enough, the specific inline policies can be found at
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With