What is "Hadoop" - the definition of Hadoop?

Question

It is kind of obvious and we will all agree that we can call HDFS + YARN + MapReduce as Hadoop. But what happens with different other combinations and other products in the Hadoop ecosystem?

Is, for example, HDFS + YARN + Spark still Hadoop? Is HBASE Hadoop? I guess we consider HDFS + YARN + Pig Hadoop, since Pig uses MapReduce.

Are only the MapReduce tools considered Hadoop, but anything else run on HDFS + YARN(like Spark) is non Hadoop?

Daniel Darabos · Accepted Answer

I agree with your impression that the "Hadoop" term does not have a useful definition. "We have a Hadoop cluster" may mean various things.

There is an official answer though at http://hadoop.apache.org/#What+Is+Apache+Hadoop%3F:

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

So "Hadoop" is the name of a project and a software library. Any other usage is ill-defined.

ASHISH M.G · Answer

Hadoop is not a stack like LAMP or MEAN stack. Hadoop is a collections of frameworks and tools that work togethor to solve complex big data problems.

It is basically a Project under Apache foundation. Various Subprojects like Mapreduce, Ambari, Sqoop, Spark, Zookeeper, etc togethor makes ,what is called a Hadoop Ecosystem.

Source : https://www.datacloudschool.com/2020/01/introduction-what-is-hadoop.html

What is "Hadoop" - the definition of Hadoop?

Tags:

apache-spark

hadoop

hadoop-yarn

hbase

hdfs

neuromouse

2 Answers

Daniel Darabos

ASHISH M.G

Recent Activity

Donate For Us

What is "Hadoop" - the definition of Hadoop?

Tags:

apache-spark

hadoop

hadoop-yarn

hbase

hdfs

neuromouse

2 Answers

Daniel Darabos

ASHISH M.G

Related questions

Recent Activity

Donate For Us