NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream when execute spark-shell

Question

I've downloaded the prebuild version of spark 1.4.0 without hadoop (with user-provided Haddop). When I ran the spark-shell command, I got this error:

> Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/
FSDataInputStream
        at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
        at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSpa
rkProperties$1.apply(SparkSubmitArguments.scala:111)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkPropert
ies(SparkSubmitArguments.scala:111)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArgume
nts.scala:97)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:106)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStr
eam
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 7 more

I've searched on Internet, it is said that HADOOP_HOME has not been set yet in spark-env.cmd. But I cannot find spark-env.cmd in the spark installation folder. I've traced the spark-shell command and it seems that there are no HADOOP_CONFIG in there. I've tried to add the HADOOP_HOME on environment variable but it still give the same exception.

Actually I don't really using the hadoop. I downloaded hadoop as a workaround as suggested in this question

I am using windows 8 and scala 2.10.

Any help will be appreciated. Thanks.

tiho · Accepted Answer

The "without Hadoop" in the Spark's build name is misleading: it means the build is not tied to a specific Hadoop distribution, not that it is meant to run without it: the user should indicate where to find Hadoop (see https://spark.apache.org/docs/latest/hadoop-provided.html)

One clean way to fix this issue is to:

Obtain Hadoop Windows binaries. Ideally build them, but this is painful (for some hints see: Hadoop on Windows Building/ Installation Error). Otherwise Google some up, for instance currently you can download 2.6.0 from here: http://www.barik.net/archive/2015/01/19/172716/
Create a spark-env.cmd file looking like this (modify Hadoop path to match your installation): @echo off set HADOOP_HOME=D:\Utils\hadoop-2.7.1 set PATH=%HADOOP_HOME%\bin;%PATH% set SPARK_DIST_CLASSPATH=<paste here the output of %HADOOP_HOME%\bin\hadoop classpath>
Put this spark-env.cmd either in a conf folder located at the same level as your Spark base folder (which may look weird), or in a folder indicated by the SPARK_CONF_DIR environment variable.

Hamed MP · Answer

I had the same problem, in fact it's mentioned on the Getting started page of Spark how to handle it:

### in conf/spark-env.sh ###

# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)

# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)

If you want to use your own hadoop follow one of the 3 options, copy and paste it into spark-env.sh file :

1- if you have the hadoop on your PATH

2- you want to show hadoop binary explicitly

3- you can also show hadoop configuration folder

http://spark.apache.org/docs/latest/hadoop-provided.html

Jimson James · Answer

I too had the issue,

export SPARK_DIST_CLASSPATH=`hadoop classpath`

resolved the issue.

NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream when execute spark-shell

Tags:

apache-spark

David

3 Answers

tiho

Hamed MP

Jimson James

Recent Activity

Donate For Us

NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream when execute spark-shell

Tags:

apache-spark

David

3 Answers

tiho

Hamed MP

Jimson James

Related questions

Recent Activity

Donate For Us