Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use Spark without Hadoop for development environment?

I'm very new to the concepts of Big Data and related areas, sorry if I've made some mistake or typo.

I would like to understand Apache Spark and use it only in my computer, in a development / test environment. As Hadoop include HDFS (Hadoop Distributed File System) and other softwares that only matters to distributed systems, can I discard that? If so, where can I download a version of Spark that doesn't need Hadoop? Here I can find only Hadoop dependent versions.

What do I need:

  • Run all features from Spark without problems, but in a single computer (my home computer).
  • Everything that I made in my computer with Spark should run in a future cluster without problems.

There's reason to use Hadoop or any other distributed file system for Spark if I will run it on my computer for testing purposes?

Note that "Can apache spark run without hadoop?" is a different question from mine, because I do want run Spark in a development environment.

like image 453
Paladini Avatar asked Sep 12 '15 00:09

Paladini


People also ask

Can Spark work without Hadoop?

Do I need Hadoop to run Spark? No, but if you run on a cluster, you will need some form of shared file system (for example, NFS mounted at the same path on each node). If you have this type of filesystem, you can just deploy Spark in standalone mode.

Does Spark only run on Hadoop?

You can Run Spark without Hadoop in Standalone Mode Spark and Hadoop are better together Hadoop is not essential to run Spark. If you go by Spark documentation, it is mentioned that there is no need for Hadoop if you run Spark in a standalone mode. In this case, you need resource managers like CanN or Mesos only.

Do we need HDFS for running Spark application?

No, not mandatory, but there is no separate storage in Spark, so it uses the local file system to store the data. You can load data from the local system and process it, Hadoop or HDFS is not mandatory to run spark application.

Does Spark replace Hadoop?

So when people say that Spark is replacing Hadoop, it actually means that big data professionals now prefer to use Apache Spark for processing the data instead of Hadoop MapReduce. MapReduce and Hadoop are not the same – MapReduce is just a component to process the data in Hadoop and so is Spark.


1 Answers

Yes you can install Spark without Hadoop. Go through Spark official documentation :http://spark.apache.org/docs/latest/spark-standalone.html

Rough steps :

  1. Download precomplied spark or download spark source and build locally
  2. extract TAR
  3. Set required environment variable
  4. Run start script .

Spark(without Hadoop) - Available on Spark Download page URL : https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz

If this url do not work then try to get it from Spark download page

like image 112
pradeep Avatar answered Sep 22 '22 01:09

pradeep