NameNode HA when using hdfs:// URI

Tags:

With HDFS or HFTP URI scheme (e.g. hdfs://namenode/path/to/file) I can access HDFS clusters without requiring their XML configuration files. It is very handy when running shell commands like hdfs dfs -get, hadoop distcp or reading files from Spark like sc.hadoopFile(), because I don't have to copy and manage xml files for all relevant HDFS clusters to all nodes that those codes might potentially run.

One drawback of this approach is that I have to use the active NameNode's hostname, otherwise Hadoop will throw an exception complaining that the NN is standby.

A usual workaround is to try one and then try another if any exception is caught, or to connect to ZooKeeper directly and parse the binary data using protobuf.

Both of these methods are cumbersome, when compared to (for example) mysql's loadbalance URI or ZooKeeper's connection string where I can just comma-separate all hosts in the URI and the driver automatically finds a node to talk to.

Say I have active and standby namenode hosts nn1 and nn2. What is the simplest way to refer a specific path of the HDFS, which:

can be used in command-line tools like hdfs, hadoop
can be used in Hadoop Java API (and thus tools depending on it like Spark) with minimum configuration
works regardless of which namenode is currently active.

909

asked Apr 14 '15 18:04

lyomi

1 Answers

In this scenarion instead of checking for active namenode host and port combination, we should use nameservice as, nameservice will automatically transfer client requests to active namenode.

Name service acts like a proxy among Namenodes, which always divert HDFS request to active namenode

Example: hdfs://nameservice_id/file/path/in/hdfs

Sample steps to create `nameservice`

In hdfs-site.xml file

Create a nameservice by adding an id to it(here nameservice_id is mycluster)

<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
  <description>Logical name for this new nameservice</description>
</property>

Now specify namenode ids to determine namenodes in cluster

dfs.ha.namenodes.[$nameservice ID]:

<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
  <description>Unique identifiers for each NameNode in the nameservice</description>
</property>

Then link namenode ids with namenode hosts

dfs.namenode.rpc-address.[$nameservice ID].[$name node ID]

<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>machine1.example.com:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>machine2.example.com:8020</value>
</property>

There are so many properties involved to Configure Namenode HA properly with Nameservice

With this setup the HDFS url for a file will looks like this
hdfs://mycluster/file/location/in/hdfs/wo/namenode/host

Edit:

Applying properties with java code

Configuration conf = new Configuration(false);
conf.set("dfs.nameservices","mycluster");
conf.set("dfs.ha.namenodes.mycluster","nn1,nn2");
conf.set("dfs.namenode.rpc-address.mycluster.nn1","machine1.example.com:8020");
conf.set("dfs.namenode.rpc-address.mycluster.nn2","machine2.example.com:8020");

FileSystem fsObj =  FileSystem.get("relative/path/of/file/or/dir", conf);

// now use fsObj to perform HDFS shell like operations
fsObj ...

149

answered Nov 15 '22 16:11

mrsrinivas

Related questions
                            
                                Why would Spark choose to do all work on a single node?
                            
                                HBASE 0.94.1 compatibility with hadoop
                            
                                Hadoop Ports Clarification
                            
                                could to find or load main class org.apache.nutch.crawl.InjectorJob
                            
                                Spark 1.3.0 on YARN: Application failed 2 times due to AM Container
                            
                                Why would someone run Spark / Flink on Tez?
                            
                                How does Pig use Hadoop Globs in a 'load' statement?
                            
                                Establishing a connection between R and a Hive (Hadoop) database
                            
                                MultipleTextOutputFormat alternative in new API
                            
                                Exploding a row of XML data in Hive
                            
                                Kerberos Authentication Error - When loading Hadoop Config Files from SharedPath
                            
                                Is it possible to configure Apache Livy to run with Spark Standalone?
                            
                                Hadoop + Spark: There are 1 datanode(s) running and 1 node(s) are excluded in this operation
                            
                                How can I tell if a hadoop namenode has already been formatted?
                            
                                How to resolve 'file could only be replicated to 0 nodes, instead of 1' in hadoop?
                            
                                Hadoop - textouputformat.separator use ctrlA ( ^A )
                            
                                HBase: what is NotServingRegionException?
                            
                                Get user running HIVE job?
                            
                                Hadoop MapReduce log4j - log messages to a custom file in userlogs/job_ dir?
                            
                                Hadoop NameNode recovery from metadata backup

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NameNode HA when using hdfs:// URI

Tags:

uri

hadoop

hdfs

high-availability

lyomi

People also ask

1 Answers

Sample steps to create `nameservice`

Edit:

mrsrinivas

Recent Activity

Donate For Us

NameNode HA when using hdfs:// URI

Tags:

uri

hadoop

hdfs

high-availability

lyomi

People also ask

1 Answers

Sample steps to create nameservice

Edit:

mrsrinivas

Related questions

Recent Activity

Donate For Us

Sample steps to create `nameservice`