Namenode high availability client request

Tags:

Can anyone please tell me that If I am using java application to request some file upload/download operations to HDFS with Namenode HA setup, Where this request go first? I mean how would the client know that which namenode is active?

It would be great if you provide some workflow type diagram or something that explains request steps in detail(start to end).

919

asked Mar 10 '16 08:03

user2846382

1 Answers

Please check Namenode HA architecture with key entities in HDFS client requests handling.

HA architecture

Where this request go first? I mean how would client know that which namenode is active?

For client/driver it doesn't matter which namenode is active. because we query on HDFS with nameservice id rather than hostname of namenode. nameservice will automatically transfer client requests to active namenode.

Example: hdfs://nameservice_id/rest/of/the/hdfs/path

Explanation:

How this hdfs://nameservice_id/ works and what are the confs involved in it?

In hdfs-site.xml file

Create a nameservice by adding an id to it(here nameservice_id is mycluster)

<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
  <description>Logical name for this new nameservice</description>
</property>

Now specify namenode ids to determine namenodes in cluster

dfs.ha.namenodes.[$nameservice ID]:

<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
  <description>Unique identifiers for each NameNode in the nameservice</description>
</property>

Then link namenode ids with namenode hosts

dfs.namenode.rpc-address.[$nameservice ID].[$name node ID]

<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>machine1.example.com:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>machine2.example.com:8020</value>
</property>

After that specify the Java class that HDFS clients use to contact the Active NameNode so that DFS Client uses this class to determine which NameNode is currently serving client requests.

<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

Finally HDFS URL will be like this after these configuration changes.

hdfs://mycluster/<file_lication_in_hdfs>

To answer your question I have taken few configuration only. please check the detailed documentation for how does Namenodes, Journalnodes and Zookeeper machines form Namenode HA in HDFS.

answered Oct 07 '22 15:10

mrsrinivas

Related questions
                            
                                Initialization failed for Block pool <registering> (Datanode Uuid unassigned)
                            
                                How can I use proto3 with Hadoop/Spark?
                            
                                Reading in csv file as dataframe from hdfs
                            
                                number of reducers for 1 task in MapReduce
                            
                                How to run Hbase Java example?
                            
                                HDFS Reduced Replication Factor
                            
                                Which files are ignored as input by mapper?
                            
                                Difference between fs.defaultFS and fs.default.name
                            
                                How to optimize shuffling/sorting phase in a hadoop job
                            
                                Broken Pipe Error causes streaming Elastic MapReduce job on AWS to fail
                            
                                Hadoop streaming - remove trailing tab from reducer output
                            
                                Invalid URI for NameNode address
                            
                                Confusion about distributed cache in Hadoop
                            
                                hdfs Datanode denied communication with namenode because hostname cannot be resolved
                            
                                Oozie Job Error - java.io.IOException: configuration is not specified
                            
                                Get Columns in a specific Column Family for a row HBase
                            
                                Read a text file from HDFS line by line in mapper
                            
                                Connect Hive through Java JDBC
                            
                                Hive table locks
                            
                                Difference between job, application, task, task attempt logs in Hadoop, Oozie

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Namenode high availability client request

Tags:

hadoop

hadoop2

hdfs

webhdfs

user2846382

People also ask

1 Answers

mrsrinivas

Recent Activity

Donate For Us