Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "Client" exactly mean for Hadoop / HDFS?

Tags:

hadoop

hdfs

I understand the general concept behind it, but I would like more clarification and a clear-cut definition of what a "client" is.

For example, if I just write an hdfs command on the Terminal, is it still a "client" ?

like image 320
Mehdi LAMRANI Avatar asked Apr 05 '17 04:04

Mehdi LAMRANI


People also ask

How does the client communicate with HDFS?

So, what happens is that the Client communication to HDFS happens using Hadoop HDFS API. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file on HDFS.

How would a client write to a HDFS?

To write a file in HDFS, a client needs to interact with master i.e. namenode (master). Namenode provides the address of the datanodes (slaves) on which client will start writing the data. Client can directly write data on the datanodes, now datanode will create data write pipeline.

How does a client read a file from HDFS?

Step 1: The client opens the file it wishes to read by calling open() on the File System Object(which for HDFS is an instance of Distributed File System). Step 2: Distributed File System( DFS) calls the name node, using remote procedure calls (RPCs), to determine the locations of the first few blocks in the file.

Which two events occur when a client writes a file to HDFS?

HDFS write operation. To write data in HDFS, the client first interacts with the NameNode to get permission to write data and to get IPs of DataNodes where the client writes the data. The client then directly interacts with the DataNodes for writing data.


2 Answers

Client in Hadoop refers to the Interface used to communicate with the Hadoop Filesystem. There are different type of Clients available with Hadoop to perform different tasks.

The basic filesystem client hdfs dfs is used to connect to a Hadoop Filesystem and perform basic file related tasks. It uses the ClientProtocol to communicate with a NameNode daemon, and connects directly to DataNodes to read/write block data. To perform administrative tasks on HDFS, there is hdfs dfsadmin. For HA related tasks, hdfs haadmin. There are similar clients available for performing YARN related tasks.

These Clients can be invoked using their respective CLI commands from a node where Hadoop is installed and has the necessary configurations and libraries required to connect to a Hadoop Filesystem. Such nodes are often referred as Hadoop Clients.

For example, if I just write an hdfs command on the Terminal, is it still a "client" ?

Technically, Yes. If you are able to access the FS using the hdfs command, then the node has the configurations and libraries required to be a Hadoop Client.

PS: APIs are also available to create these Clients programmatically.

like image 189
franklinsijo Avatar answered Oct 10 '22 17:10

franklinsijo


Edge nodes are the interface between the Hadoop cluster and the outside network. This node/host will have all the libraries and client components present, as well as current configuration of the cluster to connect to the hdfs. This thread discusses same

like image 41
SurjanSRawat Avatar answered Oct 10 '22 17:10

SurjanSRawat