I create an AWS IAM role called "my-role" specifying EC2 as trusted entity, i.e. using the trust relationship policy document:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
The role has the following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetBucketAcl",
"s3:GetBucketCORS",
"s3:GetBucketLocation",
"s3:GetBucketLogging",
"s3:GetBucketNotification",
"s3:GetBucketPolicy",
"s3:GetBucketRequestPayment",
"s3:GetBucketTagging",
"s3:GetBucketVersioning",
"s3:GetBucketWebsite",
"s3:GetLifecycleConfiguration",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:GetObjectTorrent",
"s3:GetObjectVersion",
"s3:GetObjectVersionAcl",
"s3:GetObjectVersionTorrent",
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:ListBucketVersions",
"s3:ListMultipartUploadParts",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:PutObjectVersionAcl",
"s3:RestoreObject"
],
"Resource": [
"arn:aws:s3:::my-bucket/*"
]
}
]
}
I launch an EC2 instance (Amazon Linux 2014.09.1) from the command line using AWS CLI, specifying "my-role" as instance profile and everything works out fine. I verify that the instance effectively assumes "my-role", by running:
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
to query for instance metadata, from which I get the response my-role
;curl http://169.254.169.254/latest/meta-data/iam/security-credentials/my-role
from which I get temporary credentials associated to "my-role".An example of such credentials retrieval response is something like:
{
"Code" : "Success",
"LastUpdated" : "2015-01-19T10:37:35Z",
"Type" : "AWS-HMAC",
"AccessKeyId" : "an-access-key-id",
"SecretAccessKey" : "a-secret-access-key",
"Token" : "a-token",
"Expiration" : "2015-01-19T16:47:09Z"
}
aws s3 ls s3://my-bucket/
from which I correctly get a list containing the first subdirectory(ies) under "my-bucket". (The AWS CLI comes installed and configured by default when launching this AMI. EC2 instance and S3 bucket are within the same AWS account)
I run/install a Tomcat7 server and container on such instance, on which I deploy a J2EE 1.7 servlet with no issues.
Such servlet should download on the local file system a file from an S3 bucket, in particular from s3://my-bucket/custom-path/file.tar.gz
using Hadoop Java APIs. (Please, note that I tried hadoop-common artifact 2.4.x, 2.5.x, 2.6.x with no positive results. I'm gonna post below the exception I get when using 2.5.x)
Within the servlet, I retrieve fresh credentials from the instance metadata URL above mentioned and use them to configure my Hadoop Java API instance:
...
Path path = new Path("s3n://my-bucket/");
Configuration conf = new Configuration();
conf.set("fs.defaultFS", path.toString());
conf.set("fs.s3n.awsAccessKeyId", myAwsAccessKeyId);
conf.set("fs.s3n.awsSecretAccessKey", myAwsSecretAccessKey);
conf.set("fs.s3n.awsSessionToken", mySessionToken);
...
Obviously, myAwsAccessKeyId
, myAwsSecretAccessKey
, and mySessionToken
are Java variables that I previously set with the actual values.
Then, I effectively get a FileSystem instance, using:
FileSystem fs = path.getFileSystem(conf);
I am able to retrieve all the configuration related to the FileSystem (fs.getconf().get(key-name)) and verify everything is configured as assumed.
I cannot download s3://my-bucket/custom-path/file.tar.gz
using:
...
fs.copyToLocalFile(false, new Path(path.toString()+"custom-path/file.tar.gz"), outputLocalPath);
...
If I use hadoop-common 2.5.x I get the IOException
:
org.apache.hadoop.security.AccessControlException: Permission denied: s3n://my-bucket/custom-path/file.tar.gz at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:449) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411) at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at org.apache.hadoop.fs.s3native.$Proxy12.retrieveMetadata(Unknown Source) at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:467) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1968) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1937) ...
If I use hadoop-common 2.4.x, I get a NullPointerException
:
java.lang.NullPointerException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1968) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1937) ...
Just for the records, if DON'T set any aws credential, I get:
AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).
<hadoop-dir>/bin/hadoop fs -cp s3n://<aws-access-key-id>:<aws-secret-access-key>@my-bucket/custom-path/file.tar.gz .
I get, once again, a NPE:Fatal internal error java.lang.NullPointerException at org.apache.hadoop.fs.s3native.NativeS3FileSystem.listStatus(NativeS3FileSystem.java:479) at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268) at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347) at org.apache.hadoop.fs.shell.Ls.processPathArgument(Ls.java:96) at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260) at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244) at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190) at org.apache.hadoop.fs.shell.Command.run(Command.java:154) at org.apache.hadoop.fs.FsShell.run(FsShell.java:255) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.fs.FsShell.main(FsShell.java:308)
Sorry for the long post, I just tried to be as much detailed as I could. Thanks for any eventual help out here.
You are using STS/temporary AWS credentials; these do not appear to be currently supported by the s3
or s3n
FileSystem
implementations in hadoop.
AWS STS/temporary credentials include not only an (access key, secret key), but additionally a session token. The hadoop s3
and s3n
FileSystem
(s) do not yet support inclusion of the session token (i.e. your configuration of fs.s3n.awsSessionToken
is unsupported and ignored by the s3n
FileSystem
.
From AmazonS3 - Hadoop Wiki...
(Note there is no mention of fs.s3.awsSessionToken
):
Configuring to use s3/ s3n filesystems
Edit your
core-site.xml
file to include your S3 keys<property> <name>fs.s3.awsAccessKeyId</name> <value>ID</value> </property> <property> <name>fs.s3.awsSecretAccessKey</name> <value>SECRET</value> </property>
If you take a look at S3Credentials.java from apache/hadoop on github.com, you'll notice that the notion of a session token is completely missing from the representation of S3 credentials.
There was a patch submitted to address this limitation (detailed here); however, it hasn't been integrated.
s3a
FileSystem
that was added in Hadoop 2.6.0. It claims to have support for IAM role-based authentication (i.e. you wouldn't have to explicitly specify the keys at all).
A Hadoop JIRA ticket describes how to configure the s3a
FileSystem
:
From https://issues.apache.org/jira/browse/HADOOP-10400 :
fs.s3a.access.key
- Your AWS access key ID (omit for role authentication)fs.s3a.secret.key
- Your AWS secret key (omit for role authentication)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With