I'm on an AWS EC2 VM (Ubuntu 14.04), willing to do some basics with Spark on RDDs from my S3 files. While running successfully this dirty command (not using sparkContext.hadoopConfiguration
for the moment)
scala> val distFile = sc.textFile("s3n://< AWS_ACCESS_KEY_ID>:<AWS_SECRET_ACCESS_KEY>@bucketname/folder1/folder2/file.csv")
I then get the following error when running a distFile.count()
java.lang.NoClassDefFoundError: org/jets3t/service/ServiceException
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:334)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:324)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2596)
...
...
Caused by: java.lang.ClassNotFoundException: org.jets3t.service.ServiceException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
I have previously
export
of both keys as env variables in .bashrcSPARK_HADOOP_VERSION=2.6.0-cdh5.4.1 sbt/sbt assembly
Does it have to do with the syntax of the textFile("s3n// ...")
? I've tried others, including s3://
without success ...
Thank you
Include Jets3t jar to your class-path. Add proper compatible version with your current setting. You need ServiceException to be added to your class path.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With