<p>Can Some one explain me the architecture of Edge node in hadoop. I am able to find only the definition on the internet, I have the following queries -</p> <p>1) Does the edge node have to be part of the cluster (What advantages do we have if it is inside the cluster?). Does it store any blocks of data in hdfs.</p> <p>2) Can the edge node be outside the cluster?</p>

<p>+1 with the Dell explanation. In my opinion, edge nodes in a Hadoop cluster are typically nodes that are responsible for running the client-side operations of a Hadoop cluster. Typically edge-nodes are kept separate from the nodes that contain Hadoop services such as HDFS, MapReduce, etc, mainly to keep computing resources separate. For smaller clusters only having a few nodes, it's common to see nodes playing a hybrid combination of roles for master services (JT, NN, etc.) , slave services (TT, DN, etc) and gateway services.</p> <p>Note that running master and slave Hadoop services on the same node is not an ideal setup, and can cause scaling and resource issues depending on what's at use. This kind of configuration is typically seen on a small-scale dev environment.</p> <p>With that said, here's some answers to your questions posted:</p> <h3>1) Does the edge node have to be part of the cluster?</h3> <p>The edge node does not have to be part of the cluster, however if it is outside of the cluster (meaning it doesn't have any specific Hadoop service roles running on it), it will need some basic pieces such as Hadoop binaries and current Hadoop cluster config files to submit jobs on the cluster.</p> <h3>2) What advantages do we have if it is inside the cluster?</h3> <p>Depending on which distribution is in use, edge nodes run within the cluster allow for centralized management of all the Hadoop configuration entries on the cluster nodes which helps to reduce the amount of administration needed to update the config files. Usually this is a one-to-many approach, where config entries are updated in one location and are pushed out to all (many) nodes in the cluster.</p> <p>However, when one of the nodes within the cluster is also used as an edge node, there are CPU and memory resources that are consumed by the client operations which detracts the available resources that could be utilized by the running Hadoop services in that node.</p> <h3>3) Does it store any blocks of data in hdfs?</h3> <p>Unless the edge node is configured with a DataNode service, blocks of data will not be stored on that node.</p> <h3>4) Should the edge node be outside the cluster?</h3> <p>As mentioned above, it can be dependent on the cluster environment and use-case; One of the supporting reasons to configure it outside of the cluster is to keep the client-running and Hadoop services separated.</p> <p>Keeping an edge node separate allows that node to utilize the full computing resources available for Hadoop processing.</p> <p>Hope this helps!</p>

Edge nodes in hadoop cluster

1 Answers

+1 with the Dell explanation. In my opinion, edge nodes in a Hadoop cluster are typically nodes that are responsible for running the client-side operations of a Hadoop cluster. Typically edge-nodes are kept separate from the nodes that contain Hadoop services such as HDFS, MapReduce, etc, mainly to keep computing resources separate. For smaller clusters only having a few nodes, it's common to see nodes playing a hybrid combination of roles for master services (JT, NN, etc.) , slave services (TT, DN, etc) and gateway services.

Note that running master and slave Hadoop services on the same node is not an ideal setup, and can cause scaling and resource issues depending on what's at use. This kind of configuration is typically seen on a small-scale dev environment.

With that said, here's some answers to your questions posted:

1) Does the edge node have to be part of the cluster?

The edge node does not have to be part of the cluster, however if it is outside of the cluster (meaning it doesn't have any specific Hadoop service roles running on it), it will need some basic pieces such as Hadoop binaries and current Hadoop cluster config files to submit jobs on the cluster.

2) What advantages do we have if it is inside the cluster?

Depending on which distribution is in use, edge nodes run within the cluster allow for centralized management of all the Hadoop configuration entries on the cluster nodes which helps to reduce the amount of administration needed to update the config files. Usually this is a one-to-many approach, where config entries are updated in one location and are pushed out to all (many) nodes in the cluster.

However, when one of the nodes within the cluster is also used as an edge node, there are CPU and memory resources that are consumed by the client operations which detracts the available resources that could be utilized by the running Hadoop services in that node.

3) Does it store any blocks of data in hdfs?

Unless the edge node is configured with a DataNode service, blocks of data will not be stored on that node.

4) Should the edge node be outside the cluster?

As mentioned above, it can be dependent on the cluster environment and use-case; One of the supporting reasons to configure it outside of the cluster is to keep the client-running and Hadoop services separated.

Keeping an edge node separate allows that node to utilize the full computing resources available for Hadoop processing.

Hope this helps!

answered Sep 22 '22 13:09

Anthony R.

Related questions
                            
                                Import data from HDFS to HBase (cdh3u2)
                            
                                Mapreduce for dummies
                            
                                Hadoop namenode needs to be formatted after every computer start
                            
                                No partition predicate found for Alias even when the partition predicate in present in the query
                            
                                What is Lineage In Spark?
                            
                                Hbase mapreduce error
                            
                                What is Memory reserved on Yarn
                            
                                How does Apache Flink compare to Mapreduce on Hadoop?
                            
                                How does Hive stores data and what is SerDe?
                            
                                Moving data to hdfs using copyFromLocal switch
                            
                                Accessing a mapper's counter from a reducer
                            
                                java.sql.SQLException: No suitable driver found for jdbc:hive://localhost:10000/default
                            
                                Store images/videos into Hadoop HDFS
                            
                                Hadoop put performance - large file (20gb)
                            
                                what are the replacement for hadoop Job deprecated class
                            
                                Hadoop WordCount example stuck at map 100% reduce 0%
                            
                                How do I delete files in hdfs directory after reading it using scala?
                            
                                Small files and HDFS blocks
                            
                                How to run a jar file in hadoop?
                            
                                First hadoop project error: "Input path does not exist"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Edge nodes in hadoop cluster

Tags:

hadoop

bigdata

Vishnu Subramanian

People also ask