Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If I already have Hadoop installed, should I download Apache Spark WITH Hadoop or WITHOUT Hadoop?

I already have Hadoop 3.0.0 installed. Should I now install the with-hadoop or without-hadoop version of Apache Spark from this page?

I am following this guide to get started with Apache Spark.
It says

Download the latest version of Apache Spark (Pre-built according to your Hadoop version) from this link:...

But I am confused. If I already have an instance of Hadoop running in my machine, and then I download, install and run Apache-Spark-WITH-Hadoop, won't it start another additional instance of Hadoop?

like image 321
JBel Avatar asked Dec 24 '22 10:12

JBel


1 Answers

First off, Spark does not yet support Hadoop 3, as far as I know. You'll notice this by no available option for "your Hadoop version" available for download.

You can try setting HADOOP_CONF_DIR and HADOOP_HOME in your spark-env.sh, though, regardless of which you download.

You should always download the version without Hadoop if you already have it.

won't it start another additional instance of Hadoop?

No. You still would need to explicitly configure and start that version of Hadoop.

That Spark option is already configured to use the included Hadoop, I believe

like image 158
OneCricketeer Avatar answered May 14 '23 23:05

OneCricketeer