It is kind of obvious and we will all agree that we can call HDFS + YARN + MapReduce as Hadoop. But what happens with different other combinations and other products in the Hadoop ecosystem?
Is, for example, HDFS + YARN + Spark still Hadoop? Is HBASE Hadoop? I guess we consider HDFS + YARN + Pig Hadoop, since Pig uses MapReduce.
Are only the MapReduce tools considered Hadoop, but anything else run on HDFS + YARN(like Spark) is non Hadoop?
I agree with your impression that the "Hadoop" term does not have a useful definition. "We have a Hadoop cluster" may mean various things.
There is an official answer though at http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F:
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
So "Hadoop" is the name of a project and a software library. Any other usage is ill-defined.
Hadoop is not a stack like LAMP or MEAN stack. Hadoop is a collections of frameworks and tools that work togethor to solve complex big data problems.
It is basically a Project under Apache foundation. Various Subprojects like Mapreduce, Ambari, Sqoop, Spark, Zookeeper, etc togethor makes ,what is called a Hadoop Ecosystem.
Source : https://www.datacloudschool.com/2020/01/introduction-what-is-hadoop.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With