Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WebHDFS vs HttpFS

Tags:

What is the difference between the WebHDFS REST API and HttpFS?

If I understand correctly:

  • HttpFS is an independent service that exposes a REST API on top of HDFS
  • WebHDFS is a REST API built-into HDFS. It doen't require any further installation

Am I correct?

When would be advisable to use one instead of the other?

like image 366
Santiago Cepas Avatar asked Jul 30 '14 09:07

Santiago Cepas


People also ask

Can we access HDFS over HTTP?

We can access HDFS over HTTP. Major difference between WebHDFS and HttpFs: WebHDFS needs access to all nodes of the cluster and when some data is read it is transmitted from that node directly, whereas in HttpFs, a singe node will act similar to a "gateway" and will be a single point of data transfer to the client node.

What is httpfs in Hadoop?

HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is interoperable with the webhdfs REST HTTP API. HttpFS can be used to transfer data between clusters running different versions of Hadoop (overcoming RPC versioning issues), for example using Hadoop DistCP.

Is webHDFS a single point of access?

WebHDFS is a proxy living in the HDFS cluster and it layers on HDFS, so all data needs to be streamed to the proxy before it gets relayed on to the WebHDFS client. In essence it becomes a single point of access and an IO bottleneck.

Is there an API for webHDFS in C?

There is also a libwebhdfs library that uses the WebHDFS interface. The C API is very similar to the Java one, but it typically lags the Java one, so some newer features may not be supported. You can find the header file, hdfs.h, in the include directory of the Apache Hadoop binary tarball distribution.


1 Answers

I have read a article related with your question. following is the link.

https://www.linkedin.com/today/post/article/20140717115238-176301000-accessing-hdfs-using-the-webhdfs-rest-api-vs-httpfs

WebHDFS vs HttpFs Major difference between WebHDFS and HttpFs: WebHDFS needs access to all nodes of the cluster and when some data is read it is transmitted from that node directly, whereas in HttpFs, a singe node will act similar to a "gateway" and will be a single point of data transfer to the client node. So, HttpFs could be choked during a large file transfer but the good thing is that we are minimizing the footprint required to access HDFS.

like image 173
Likoed Avatar answered Sep 24 '22 06:09

Likoed