We are trying to setup Cloudera 5.5 where HDFS will be working on s3 only for that we have already configured the necessory properties in Core-site.xml
<property>
<name>fs.s3a.access.key</name>
<value>################</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>###############</value>
</property>
<property>
<name>fs.default.name</name>
<value>s3a://bucket_Name</value>
</property>
<property>
<name>fs.defaultFS</name>
<value>s3a://bucket_Name</value>
</property>
After setting it up we were able to browse the files for s3 bucket from command
hadoop fs -ls /
And it shows the files available on s3 only.
But when we start the yarn services JobHistory server fails to start with below error and on launching pig jobs we are getting same error
PriviledgedActionException as:mapred (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
ERROR org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils
Unable to create default file context [s3a://kyvosps]
org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
On serching on Internet we found that we need to set following properties as well in core-site.xml
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>The implementation class of the S3A Filesystem</description>
</property>
<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>The FileSystem for S3A Filesystem</description>
</property>
After setting the above properties we are getting following error
org.apache.hadoop.service.AbstractService
Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:131)
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:157)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:334)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:451)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:473)
at org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils.getDefaultFileContext(JobHistoryUtils.java:247)
The jars needed for this is in place but still getting the error any help will be great. Thanks in advance
Update
I tried to remove the property fs.AbstractFileSystem.s3a.impl but it give me the same first exception the one i was getting previously which is
org.apache.hadoop.security.UserGroupInformation
PriviledgedActionException as:mapred (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
ERROR org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils
Unable to create default file context [s3a://bucket_name]
org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3a
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:154)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:242)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:337)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:334)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:334)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:451)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:473)
Finally, we will move the cleansed data to S3 using the DistCp command, which is often used in data movement workflows in Hadoop ecosystem. It provides a distributed copy capability built on top of a MapReduce framework. The below code shows copying data from HDFS location to the S3 bucket.
Since Hadoop 3.1, the S3A FileSystem has been accompanied by classes designed to integrate with the Hadoop and Spark job commit protocols, classes which interact with the S3A filesystem to reliably commit work work to S3: The S3A Committers
HDFS is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data.
HDFS can have one such backup at a time. Before upgrading, administrators need to remove existing backup using bin/hadoop dfsadmin -finalizeUpgrade command. The following briefly describes the typical upgrade procedure: Before upgrading Hadoop software, finalize if there an existing backup.
The problem is not with the location of the jars.
The problem is with the setting:
<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>The FileSystem for S3A Filesystem</description>
</property>
This setting is not needed. Because of this setting, it is searching for following constructor in S3AFileSystem
class and there is no such constructor:
S3AFileSystem(URI theUri, Configuration conf);
Following exception clearly tells that it is unable to find a constructor for S3AFileSystem
with URI
and Configuration
parameters.
java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.<init>(java.net.URI, org.apache.hadoop.conf.Configuration)
To resolve this problem, remove fs.AbstractFileSystem.s3a.impl
setting from core-site.xml
. Just having fs.s3a.impl
setting in core-site.xml
should solve your problem.
EDIT:
org.apache.hadoop.fs.s3a.S3AFileSystem
just implements FileSystem
.
Hence, you cannot set value of fs.AbstractFileSystem.s3a.impl
to org.apache.hadoop.fs.s3a.S3AFileSystem
, since org.apache.hadoop.fs.s3a.S3AFileSystem
does not implement AbstractFileSystem
.
I am using Hadoop 2.7.0 and in this version s3A
is not exposed as AbstractFileSystem
.
There is JIRA ticket: https://issues.apache.org/jira/browse/HADOOP-11262 to implement the same and the fix is available in Hadoop 2.8.0.
Assuming, your jar has exposed s3A
as AbstractFileSystem
, you need to set the following for fs.AbstractFileSystem.s3a.impl
:
<property>
<name>fs.AbstractFileSystem.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3A</value>
</property>
That will solve your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With