Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Edge nodes in hadoop cluster

Tags:

hadoop

bigdata

Can Some one explain me the architecture of Edge node in hadoop. I am able to find only the definition on the internet, I have the following queries -

1) Does the edge node have to be part of the cluster (What advantages do we have if it is inside the cluster?). Does it store any blocks of data in hdfs.

2) Can the edge node be outside the cluster?

like image 886
Vishnu Subramanian Avatar asked May 22 '13 06:05

Vishnu Subramanian


People also ask

What are edge nodes used for?

An edge node device provides the intelligence to sense, measure, interpret, and connect to an internet gateway to the cloud. The data can be preprocessed with some form of analytics before it is transmitted for deeper data mining intelligence. Sensors form the front-end edge of the industrial IoT electronics ecosystem.

What is an edge cluster?

Edge clusters are IEAM edge nodes that are Kubernetes clusters. An edge cluster enables use cases at the edge, which require co-location of compute with business operations, or that require more scalability, availability, and compute capability than what can be supported by an edge device.

Is edge node a server?

Edge nodes is another term used frequently, which could either refer to a broader set of compute resources (i.e. including end-devices too) as well as a cluster of edge servers.


1 Answers

+1 with the Dell explanation. In my opinion, edge nodes in a Hadoop cluster are typically nodes that are responsible for running the client-side operations of a Hadoop cluster. Typically edge-nodes are kept separate from the nodes that contain Hadoop services such as HDFS, MapReduce, etc, mainly to keep computing resources separate. For smaller clusters only having a few nodes, it's common to see nodes playing a hybrid combination of roles for master services (JT, NN, etc.) , slave services (TT, DN, etc) and gateway services.

Note that running master and slave Hadoop services on the same node is not an ideal setup, and can cause scaling and resource issues depending on what's at use. This kind of configuration is typically seen on a small-scale dev environment.

With that said, here's some answers to your questions posted:

1) Does the edge node have to be part of the cluster?

The edge node does not have to be part of the cluster, however if it is outside of the cluster (meaning it doesn't have any specific Hadoop service roles running on it), it will need some basic pieces such as Hadoop binaries and current Hadoop cluster config files to submit jobs on the cluster.

2) What advantages do we have if it is inside the cluster?

Depending on which distribution is in use, edge nodes run within the cluster allow for centralized management of all the Hadoop configuration entries on the cluster nodes which helps to reduce the amount of administration needed to update the config files. Usually this is a one-to-many approach, where config entries are updated in one location and are pushed out to all (many) nodes in the cluster.

However, when one of the nodes within the cluster is also used as an edge node, there are CPU and memory resources that are consumed by the client operations which detracts the available resources that could be utilized by the running Hadoop services in that node.

3) Does it store any blocks of data in hdfs?

Unless the edge node is configured with a DataNode service, blocks of data will not be stored on that node.

4) Should the edge node be outside the cluster?

As mentioned above, it can be dependent on the cluster environment and use-case; One of the supporting reasons to configure it outside of the cluster is to keep the client-running and Hadoop services separated.

Keeping an edge node separate allows that node to utilize the full computing resources available for Hadoop processing.

Hope this helps!

like image 87
Anthony R. Avatar answered Sep 22 '22 13:09

Anthony R.