Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

S3N and S3A distcp not working in Hadoop 2.6.0

Summary

Stock hadoop2.6.0 install gives me no filesystem for scheme: s3n. Adding hadoop-aws.jar to the classpath now gives me ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem.

Details

I've got a mostly stock install of hadoop-2.6.0. I've only set directories, and set the following environment variables:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre
export HADOOP_COMMON_HOME=/opt/hadoop
export HADOOP_HOME=$HADOOP_COMMON_HOME
export HADOOP_HDFS_HOME=$HADOOP_COMMON_HOME
export HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME
export HADOOP_OPTS=-XX:-PrintWarnings
export PATH=$PATH:$HADOOP_COMMON_HOME/bin

The hadoop classpath is:

/opt/hadoop/etc/hadoop:/opt/hadoop/share/hadoop/common/lib/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/hdfs:/opt/hadoop/share/hadoop/hdfs/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/yarn/lib/*:/opt/hadoop/share/hadoop/yarn/*:/opt/hadoop/share/hadoop/mapreduce/lib/*:/opt/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/hadoop/share/hadoop/tools/lib/*

When I try to run hadoop distcp -update hdfs:///files/to/backup s3n://${S3KEY}:${S3SECRET}@bucket/files/to/backup I get Error: java.io.Exception, no filesystem for scheme: s3n. If I use s3a, I get the same error complaining about s3a.

The internet told me that hadoop-aws.jar is not part of the classpath by default. I added the following line to /opt/hadoop/etc/hadoop/hadoop-env.sh:

HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HADOOP_COMMON_HOME/share/hadoop/tools/lib/*

and now hadoop classpath has the following appended to it:

:/opt/hadoop/share/hadoop/tools/lib/*

which should cover /opt/hadoop/share/hadoop/tools/lib/hadoop-aws-2.6.0.jar. Now I get:

Caused by: java.lang.ClassNotFoundException:
Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

The jar file contains the class that can't be found:

unzip -l /opt/hadoop/share/hadoop/tools/lib/hadoop-aws-2.6.0.jar |grep S3AFileSystem
28349  2014-11-13 21:20   org/apache/hadoop/fs/s3a/S3AFileSystem.class

Is there an order to adding these jars, or am I missing something else critical?

like image 437
Steve Armstrong Avatar asked May 07 '15 18:05

Steve Armstrong


3 Answers

Working from Abhishek's comment on his answer, the only change I needed to make was to mapred-site.xml:

<property>
  <!-- Add to the classpath used when running an M/R job -->
  <name>mapreduce.application.classpath</name>
  <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
</property>

No changes needed to any other xml or sh files.

like image 58
Steve Armstrong Avatar answered Nov 03 '22 15:11

Steve Armstrong


You can resolve s3n issue by adding following lines to core-site.xml

<property>
<name>fs.s3n.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
<description>The FileSystem for s3n: (Native S3) uris.</description>
</property>

It should work after adding that property.

Edit: If it doesn't resolve your problem then you will have to add the jars in classpath. Can you check if mapred-site.xml has mapreduce.application.classpath: /usr/hdp//hadoop-mapreduce/*. It will include other related jars in classpath :)

like image 38
Abhishek Avatar answered Nov 03 '22 16:11

Abhishek


In current Hadoop (3.1.1) this approach no longer works. You can fix this by uncommenting the HADOOP_OPTIONAL_TOOLS line in the etc/hadoop/hadoop-env.sh file. Among other tools, this enables the hadoop-aws library.

like image 1
Pavla Avatar answered Nov 03 '22 16:11

Pavla