Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark 1.3.1: cannot read file from S3 bucket, org/jets3t/service/ServiceException

I'm on an AWS EC2 VM (Ubuntu 14.04), willing to do some basics with Spark on RDDs from my S3 files. While running successfully this dirty command (not using sparkContext.hadoopConfiguration for the moment)

scala> val distFile = sc.textFile("s3n://< AWS_ACCESS_KEY_ID>:<AWS_SECRET_ACCESS_KEY>@bucketname/folder1/folder2/file.csv")

I then get the following error when running a distFile.count()

java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:334)
         at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:324)
         at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
...
...
Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

I have previously

  • defined an AWS IAM user with corresponding AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • added the export of both keys as env variables in .bashrc
  • built Spark 1.3.1 with SPARK_HADOOP_VERSION=2.6.0-cdh5.4.1 sbt/sbt assembly
  • installed and run hadoop 2.6-cdh5.4.1 (pseudo distributed)

Does it have to do with the syntax of the textFile("s3n// ...") ? I've tried others, including s3:// without success ...

Thank you

like image 958
guzu92 Avatar asked Nov 01 '22 03:11

guzu92


1 Answers

Include Jets3t jar to your class-path. Add proper compatible version with your current setting. You need ServiceException to be added to your class path.

like image 153
Mohammad Adnan Avatar answered Nov 09 '22 15:11

Mohammad Adnan