Hadoop's HDFS with Spark

Question

I am new to cluster-computing and I am trying to set up a minimal 2-node cluster in Spark. What I am still a bit confused about: Do I have to set up a full Hadoop installation first oder does Spark ship with an included Hadoop version inside?

The stuff I find about Spark does not really make it this clear. I understood that Spark is meant as an extension to Hadoop rather than replacing it, but if it requires an independently running Hadoop system does not get clear to me.

I require a HDFS, is it thus enough to just use the file-system part of Hadoop?

Could someone point this probably obvious thing out to me?

Freddy · Accepted Answer

Apache Spark is independent from Hadoop. Spark allows you to use different sources of data (incl. HDFS) and is capable of running either in a standalone cluster, or using an existing resource management framework (eg. YARN, Mesos).

So if you're only interested in Spark, there is no need to install Hadoop.

Hadoop's HDFS with Spark

Tags:

apache-spark

hadoop

toobee

1 Answers

Freddy

Recent Activity

Donate For Us

Hadoop's HDFS with Spark

Tags:

apache-spark

hadoop

toobee

1 Answers

Freddy

Related questions

Recent Activity

Donate For Us