Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is "Hadoop" - the definition of Hadoop?

It is kind of obvious and we will all agree that we can call HDFS + YARN + MapReduce as Hadoop. But what happens with different other combinations and other products in the Hadoop ecosystem?

Is, for example, HDFS + YARN + Spark still Hadoop? Is HBASE Hadoop? I guess we consider HDFS + YARN + Pig Hadoop, since Pig uses MapReduce.

Are only the MapReduce tools considered Hadoop, but anything else run on HDFS + YARN(like Spark) is non Hadoop?

like image 893
neuromouse Avatar asked Jan 24 '15 19:01

neuromouse


2 Answers

I agree with your impression that the "Hadoop" term does not have a useful definition. "We have a Hadoop cluster" may mean various things.

There is an official answer though at http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F:

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

So "Hadoop" is the name of a project and a software library. Any other usage is ill-defined.

like image 176
Daniel Darabos Avatar answered Nov 04 '22 15:11

Daniel Darabos


Hadoop is not a stack like LAMP or MEAN stack. Hadoop is a collections of frameworks and tools that work togethor to solve complex big data problems.

It is basically a Project under Apache foundation. Various Subprojects like Mapreduce, Ambari, Sqoop, Spark, Zookeeper, etc togethor makes ,what is called a Hadoop Ecosystem.

Source : https://www.datacloudschool.com/2020/01/introduction-what-is-hadoop.html

like image 37
ASHISH M.G Avatar answered Nov 04 '22 16:11

ASHISH M.G