As spark is growing in market nowadays I can see the Spark’s major use cases over Hadoop like:
My question is:
Spark differ from hadoop in the sense that let you integrate data ingestion, proccessing and real time analytics in one tool. Moreover spark map reduce framework differ from standard hadoop map reduce because in spark intermediate map reduce result are cached, and RDD(abstarction for a distributed collection that ii fault tollerant) can be saved in memory if there is the need to reuse the same results (iterative alghoritms, group by , etc etc).
My answer is really superficial and does not not answer your question completly but just point out some of the main difference (much more in reality) Spark and databricks official site is really well documented and your question is already answered there :
https://databricks.com/spark/about
http://spark.apache.org/faq.html
Hadoop today is a collection of technologies but in its essence it is a distributed file-system (HDFS) and a distributed resource manager (YARN). Spark is a distributed computational framework that is poised to replace Map/Reduce - another distributed computational framework that
Specifically - Spark is not going to replace Hadoop but would probably replace map/reduce and Hadoop, map/reduce and spark are all distributed systems (and run in parallel)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With