I am new to cluster-computing and I am trying to set up a minimal 2-node cluster in Spark. What I am still a bit confused about: Do I have to set up a full Hadoop installation first oder does Spark ship with an included Hadoop version inside?
The stuff I find about Spark does not really make it this clear. I understood that Spark is meant as an extension to Hadoop rather than replacing it, but if it requires an independently running Hadoop system does not get clear to me.
I require a HDFS, is it thus enough to just use the file-system part of Hadoop?
Could someone point this probably obvious thing out to me?
Apache Spark is independent from Hadoop. Spark allows you to use different sources of data (incl. HDFS) and is capable of running either in a standalone cluster, or using an existing resource management framework (eg. YARN, Mesos).
So if you're only interested in Spark, there is no need to install Hadoop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With