Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which nodejs library should I use to write into HDFS?

I have a nodejs application and I want to write data into hadoop HDFS file system. I have seen two main nodejs libraries that can do it: node-hdfs and node-webhdfs. Someone have tried it? Any hints? Which one should I use in production?

I am inclined to use node-webhdfs since it uses WebHDFS REST API. node-hdfs seem to be a c++ binding.

Any help will be greatly appreciated.

like image 529
user3161639 Avatar asked Jan 05 '14 01:01

user3161639


People also ask

Can we use Hadoop with node js?

js and Hadoop to store distributed data. Hadoop is a well-known open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.

What is node js used for?

Node. js is primarily used for non-blocking, event-driven servers, due to its single-threaded nature. It's used for traditional web sites and back-end API services, but was designed with real-time, push-based architectures in mind.

Why is node js popular?

Node. js can handle many concurrent requests. This is the main reason it quickly became popular among developers and large companies. It can handle many simultaneous requests without straining the server.


2 Answers

You may want to check out webhdfs library. It provides nice and straightforward (similar to fs module API) interface for WebHDFS REST API calls.

Writing to the remote file:

var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();

var localFileStream = fs.createReadStream('/path/to/local/file');
var remoteFileStream = hdfs.createWriteStream('/path/to/remote/file');

localFileStream.pipe(remoteFileStream);

remoteFileStream.on('error', function onError (err) {
  // Do something with the error
});

remoteFileStream.on('finish', function onFinish () {
  // Upload is done
});

Reading from the remote file:

var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();

var remoteFileStream = hdfs.createReadStream('/path/to/remote/file');

remoteFileStream.on('error', function onError (err) {
  // Do something with the error
});

remoteFileStream.on('data', function onChunk (chunk) {
  // Do something with the data chunk
});

remoteFileStream.on('finish', function onFinish () {
  // Upload is done
});
like image 190
Harri Siirak Avatar answered Oct 08 '22 06:10

Harri Siirak


Not good news!!!

Do not use node-hdfs. Although it seems promising, it is now two years obsolete. I've tried to compile it but it does not match the symbols of current libhdfs. If you want to use something like that you'll have to make your own nodejs binding.

You can use node-webhdfs but IMHO there's not much advantage on that. It is better to use an http nodejs lib to make your own requests. The hardest part here is try to hold the very async nature of nodejs, since you might want first to create a folder, and then after successfully create it, create a file and then, at last, write or append data. Everything through http requests that you must send and wait the for answer to then go on....

At least node-webhdfs might be a good reference to you take a look and start your own code.

Br, Fabio Moreira

like image 24
Fabio Moreira Avatar answered Oct 08 '22 04:10

Fabio Moreira