YARN log aggregation on AWS EMR - UnsupportedFileSystemException

Question

I am struggling to enable YARN log aggregation for my Amazon EMR cluster. I am following this documentation for the configuration:

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-debugging.html#emr-plan-debugging-logs-archive

Under the section titled: "To aggregate logs in Amazon S3 using the AWS CLI".

I've verified that the hadoop-config bootstrap action puts the following in yarn-site.xml

<property><name>yarn.log-aggregation-enable</name><value>true</value></property>
<property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value></property>
<property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>3000</value></property>
<property><name>yarn.nodemanager.remote-app-log-dir</name><value>s3://mybucket/logs</value></property>

I can run a sample job (pi from hadoop-examples.jar) and see that it completed successfully on the ResourceManager's GUI.

It even creates a folder under s3://mybucket/logs named with the application id. But the folder is empty, and if I run yarn logs -applicationID <applicationId>, I get a stacktrace:

14/10/20 23:02:15 INFO client.RMProxy: Connecting to ResourceManager at /10.XXX.XXX.XXX:9022
Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3
    at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
    at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:333)
    at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:330)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
    at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:322)
    at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85)
    at org.apache.hadoop.fs.FileContext.listStatus(FileContext.java:1388)
    at org.apache.hadoop.yarn.logaggregation.LogCLIHelpers.dumpAllContainersLogs(LogCLIHelpers.java:112)
    at org.apache.hadoop.yarn.client.cli.LogsCLI.run(LogsCLI.java:137)
    at org.apache.hadoop.yarn.client.cli.LogsCLI.main(LogsCLI.java:199)

Which is doesn't make any sense to me; I can run hdfs dfs -ls s3://mybucket/ and it lists the contents just fine. The machines are getting credentials from AWS IAM Roles, I've tried adding fs.s3n.awsAccessKeyId and such to core-site.xml with no change in behavior.

Any advice is much appreciated.

James Lim · Accepted Answer

Hadoop provides two fs interfaces - FileSystem and AbstractFileSystem. Most of the time, we work with FileSystem and use configuration options like fs.s3.impl to provide custom adapters.

yarn logs, however, uses the AbstractFileSystem interface.

If you can find an implementation of that for S3, you can specify it using fs.AbstractFileSystem.s3.impl.

See core-default.xml for examples of fs.AbstractFileSystem.hdfs.impl etc.

YARN log aggregation on AWS EMR - UnsupportedFileSystemException

Tags:

hadoop

hadoop2

hadoop-yarn

emr

amazon-emr

mattwise

1 Answers

James Lim

Recent Activity

Donate For Us

YARN log aggregation on AWS EMR - UnsupportedFileSystemException

Tags:

hadoop

hadoop2

hadoop-yarn

emr

amazon-emr

mattwise

1 Answers

James Lim

Related questions

Recent Activity

Donate For Us