I'm very new to the concepts of Big Data and related areas, sorry if I've made some mistake or typo. I would like to understand Apache Spark and use it only in my computer, in a development / test environment. As Hadoop include HDFS (Hadoop Distributed File System) and other softwares that only matters to distributed systems, can I discard that? If so, where can I download a version of Spark that doesn't need Hadoop? Here I can find only Hadoop dependent versions. <h3>What do I need:</h3> <ul> <li>Run all features from Spark without problems, but in a single computer (my home computer).</li> <li>Everything that I made in my computer with Spark should run in a future cluster without problems. </li> </ul> There's reason to use Hadoop or any other distributed file system for Spark if I will run it on my computer for testing purposes? Note that "Can apache spark run without hadoop?" is a different question from mine, because I do want run Spark in a development environment.

Yes you can install Spark without Hadoop. Go through Spark official documentation :http://spark.apache.org/docs/latest/spark-standalone.html Rough steps : <ol> <li>Download precomplied spark or download spark source and build locally</li> <li>extract TAR</li> <li>Set required environment variable</li> <li>Run start script .</li> </ol> Spark(without Hadoop) - Available on Spark Download page URL : https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz If this url do not work then try to get it from Spark download page

Can I use Spark without Hadoop for development environment?

Tags:

filesystems

apache-spark

hadoop

I'm very new to the concepts of Big Data and related areas, sorry if I've made some mistake or typo.

I would like to understand Apache Spark and use it only in my computer, in a development / test environment. As Hadoop include HDFS (Hadoop Distributed File System) and other softwares that only matters to distributed systems, can I discard that? If so, where can I download a version of Spark that doesn't need Hadoop? Here I can find only Hadoop dependent versions.

What do I need:

Run all features from Spark without problems, but in a single computer (my home computer).
Everything that I made in my computer with Spark should run in a future cluster without problems.

There's reason to use Hadoop or any other distributed file system for Spark if I will run it on my computer for testing purposes?

Note that "Can apache spark run without hadoop?" is a different question from mine, because I do want run Spark in a development environment.

453

asked Sep 12 '15 00:09

Paladini

1 Answers

Yes you can install Spark without Hadoop. Go through Spark official documentation :http://spark.apache.org/docs/latest/spark-standalone.html

Rough steps :

Download precomplied spark or download spark source and build locally
extract TAR
Set required environment variable
Run start script .

Spark(without Hadoop) - Available on Spark Download page URL : https://www.apache.org/dyn/closer.lua/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz

If this url do not work then try to get it from Spark download page

112

answered Sep 22 '22 01:09

pradeep

Related questions
                            
                                Retrieve files from remote HDFS
                            
                                Number of reducers in hadoop
                            
                                Hadoop, Hive, Pig, HBase, Cassandra - when to use what? [closed]
                            
                                Hadoop slave files configuration
                            
                                MapReduce shuffle/sort method
                            
                                Loading data with Hive, S3, EMR, and Recover Partitions
                            
                                Is there ln in hadoop HDFS
                            
                                zookeeper client does not provide CLI with "jline support is disabled" message
                            
                                HBase - What's the difference between WAL and MemStore?
                            
                                Configuring Hadoop logging to avoid too many log files
                            
                                Still getting "Unable to load realm info from SCDynamicStore" after bug fix
                            
                                Hadoop Yarn Container Does Not Allocate Enough Space
                            
                                What is Keyword Context in Hadoop programming world?
                            
                                How do I test if R is running as Rscript?
                            
                                What is most efficient way to write from kafka to hdfs with files partitioning into dates
                            
                                "No Filesystem for Scheme: gs" when running spark job locally
                            
                                How are containers created based on vcores and memory in MapReduce2?
                            
                                Clojure futures in context of Scala's concurrency models
                            
                                Hadoop: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
                            
                                Hadoop Mapreduce Error Input path does not exist: hdfs://localhost:54310/user/hduser/input"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With