Can Some one explain me the architecture of Edge node in hadoop. I am able to find only the definition on the internet, I have the following queries -
1) Does the edge node have to be part of the cluster (What advantages do we have if it is inside the cluster?). Does it store any blocks of data in hdfs.
2) Can the edge node be outside the cluster?
An edge node device provides the intelligence to sense, measure, interpret, and connect to an internet gateway to the cloud. The data can be preprocessed with some form of analytics before it is transmitted for deeper data mining intelligence. Sensors form the front-end edge of the industrial IoT electronics ecosystem.
Edge clusters are IEAM edge nodes that are Kubernetes clusters. An edge cluster enables use cases at the edge, which require co-location of compute with business operations, or that require more scalability, availability, and compute capability than what can be supported by an edge device.
Edge nodes is another term used frequently, which could either refer to a broader set of compute resources (i.e. including end-devices too) as well as a cluster of edge servers.
+1 with the Dell explanation. In my opinion, edge nodes in a Hadoop cluster are typically nodes that are responsible for running the client-side operations of a Hadoop cluster. Typically edge-nodes are kept separate from the nodes that contain Hadoop services such as HDFS, MapReduce, etc, mainly to keep computing resources separate. For smaller clusters only having a few nodes, it's common to see nodes playing a hybrid combination of roles for master services (JT, NN, etc.) , slave services (TT, DN, etc) and gateway services.
Note that running master and slave Hadoop services on the same node is not an ideal setup, and can cause scaling and resource issues depending on what's at use. This kind of configuration is typically seen on a small-scale dev environment.
With that said, here's some answers to your questions posted:
The edge node does not have to be part of the cluster, however if it is outside of the cluster (meaning it doesn't have any specific Hadoop service roles running on it), it will need some basic pieces such as Hadoop binaries and current Hadoop cluster config files to submit jobs on the cluster.
Depending on which distribution is in use, edge nodes run within the cluster allow for centralized management of all the Hadoop configuration entries on the cluster nodes which helps to reduce the amount of administration needed to update the config files. Usually this is a one-to-many approach, where config entries are updated in one location and are pushed out to all (many) nodes in the cluster.
However, when one of the nodes within the cluster is also used as an edge node, there are CPU and memory resources that are consumed by the client operations which detracts the available resources that could be utilized by the running Hadoop services in that node.
Unless the edge node is configured with a DataNode service, blocks of data will not be stored on that node.
As mentioned above, it can be dependent on the cluster environment and use-case; One of the supporting reasons to configure it outside of the cluster is to keep the client-running and Hadoop services separated.
Keeping an edge node separate allows that node to utilize the full computing resources available for Hadoop processing.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With