Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop's HDFS with Spark

I am new to cluster-computing and I am trying to set up a minimal 2-node cluster in Spark. What I am still a bit confused about: Do I have to set up a full Hadoop installation first oder does Spark ship with an included Hadoop version inside?

The stuff I find about Spark does not really make it this clear. I understood that Spark is meant as an extension to Hadoop rather than replacing it, but if it requires an independently running Hadoop system does not get clear to me.

I require a HDFS, is it thus enough to just use the file-system part of Hadoop?

Could someone point this probably obvious thing out to me?

like image 971
toobee Avatar asked Mar 31 '15 12:03

toobee


1 Answers

Apache Spark is independent from Hadoop. Spark allows you to use different sources of data (incl. HDFS) and is capable of running either in a standalone cluster, or using an existing resource management framework (eg. YARN, Mesos).

So if you're only interested in Spark, there is no need to install Hadoop.

like image 163
Freddy Avatar answered Oct 10 '22 07:10

Freddy