Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Spark allow to use Amazon Assumed Role and STS temporary credentials for DynamoDB?

I need to fetch data from DynamoDB tables with Spark using Java. It works fine with user’s access key and secret key:

final JobConf jobConf = new JobConf(sc.hadoopConfiguration());
jobConf.set("dynamodb.servicename", "dynamodb");
jobConf.set("dynamodb.input.tableName", tableName);
jobConf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat");
jobConf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat");
jobConf.set("dynamodb.awsAccessKeyId",  accessKey);
jobConf.set("dynamodb.awsSecretAccessKey", secretKey);
jobConf.set("dynamodb.endpoint", endpoint);

I need to use AWS assumed role and STS (at least by security reasons) for fetching data from DynamoDB exactly with spark. Is it possible? I found that it possible to use assumed role to access AWS S3 with spark (https://issues.apache.org/jira/browse/HADOOP-12537, https://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/index.html), but haven’t found similar idea for DynamoDB.

For receiving STS temporary credentials I use the following code:

AWSSecurityTokenService stsClient = AWSSecurityTokenServiceClientBuilder.defaultClient();
AssumeRoleRequest assumeRequest = new AssumeRoleRequest()
        .withRoleArn(roleArn)  // arn:aws:iam::XXXXXXX:role/assume-role-DynamoDB-ReadOnly
        .withDurationSeconds(3600)
        .withRoleSessionName("assumed-role-session");
AssumeRoleResult assumeResult = stsClient.assumeRole(assumeRequest);
Credentials credentials = assumeResult.getCredentials();

Invoking credentials.getAccessKeyId(), credentials.getSecretAccessKey() and credentials.getSessionToken() return generated temporary credentials. With these credentials I successfully could take data from DynamoDB using java aws sdk AmazonDynamoDBClient (non-spark approach).

Is it possible with spark? Does spark allow to use something like the following: jobConf.set("dynamodb.awsSessionToken”, sessionToken) ?

like image 724
Vasyl Sarzhynskyi Avatar asked Jun 01 '17 19:06

Vasyl Sarzhynskyi


People also ask

Which AWS feature grants temporary access to specific AWS resources?

You can use AWS Security Token Service (AWS STS) to create and provide trusted users with temporary security credentials that can control access to your AWS resources.

What is AWS STS assume role?

Returns a set of temporary security credentials that you can use to access AWS resources that you might not normally have access to. These temporary credentials consist of an access key ID, a secret access key, and a security token.

How long do AWS temporary credentials last?

By default, temporary security credentials for an IAM user are valid for a maximum of 12 hours. But you can request a duration as short as 15 minutes or as long as 36 hours using the DurationSeconds parameter. For security reasons, a token for an AWS account root user is restricted to a duration of one hour.

Is AWS Sts part of IAM?

AWS provides AWS Security Token Service (AWS STS) as a web service that enables you to request temporary, limited-privilege credentials for AWS Identity and Access Management (IAM) users or for users you authenticate (federated users). This guide describes the AWS STS API.


1 Answers

Looking through the code, you may be able to use the dynamodb.customAWSCredentialsProvider with an instance of com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider to get what you want working.

https://github.com/awslabs/emr-dynamodb-connector/blob/master/emr-dynamodb-hadoop/src/main/java/org/apache/hadoop/dynamodb/DynamoDBConstants.java#L30

https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/STSAssumeRoleSessionCredentialsProvider.html


EDIT: So this was a little harder than I first thought. I ended up implementing my own wrapper around STSAssumeRoleSessionCredentialsProvider.

package foo.bar;

import com.amazonaws.auth.AWSSessionCredentials;
import com.amazonaws.auth.AWSSessionCredentialsProvider;
import com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configuration;

public class HadoopSTSAssumeRoleSessionCredentialsProvider
        implements AWSSessionCredentialsProvider, Configurable {

    private static final String ROLE_ARN_CONF = "assumed.creds.role.arn";
    private static final String SESSION_NAME_CONF = "assumed.creds.session.name";

    private Configuration configuration;
    private STSAssumeRoleSessionCredentialsProvider delegate;

    public AWSSessionCredentials getCredentials() {
        return delegate.getCredentials();
    }

    public void refresh() {
        delegate.refresh();
    }

    public void setConf(Configuration configuration) {
        this.configuration = configuration;
        String roleArn = configuration.get(ROLE_ARN_CONF);
        String sessionName = configuration.get(SESSION_NAME_CONF);

        if (roleArn == null || roleArn.isEmpty() || sessionName == null || sessionName.isEmpty()) {
            throw new IllegalStateException("Please set " + ROLE_ARN_CONF + " and "
                    + SESSION_NAME_CONF + " before use.");
        }
        delegate = new STSAssumeRoleSessionCredentialsProvider.Builder(
                roleArn, sessionName).build();
    }

    public Configuration getConf() {
        return configuration;
    }
}

And then you can use it like this:

val ddbConf: JobConf = new JobConf(sc.hadoopConfiguration)

ddbConf.set("dynamodb.customAWSCredentialsProvider",
    "foo.bar.HadoopSTSAssumeRoleSessionCredentialsProvider")
ddbConf.set("assumed.creds.role.arn", "roleArn")
ddbConf.set("assumed.creds.session.name", "sessionName")
like image 97
vkubushyn Avatar answered Oct 25 '22 07:10

vkubushyn