I have a nodejs application and I want to write data into hadoop HDFS file system. I have seen two main nodejs libraries that can do it: node-hdfs and node-webhdfs. Someone have tried it? Any hints? Which one should I use in production?
I am inclined to use node-webhdfs since it uses WebHDFS REST API. node-hdfs seem to be a c++ binding.
Any help will be greatly appreciated.
js and Hadoop to store distributed data. Hadoop is a well-known open-source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
Node. js is primarily used for non-blocking, event-driven servers, due to its single-threaded nature. It's used for traditional web sites and back-end API services, but was designed with real-time, push-based architectures in mind.
Node. js can handle many concurrent requests. This is the main reason it quickly became popular among developers and large companies. It can handle many simultaneous requests without straining the server.
You may want to check out webhdfs library. It provides nice and straightforward (similar to fs
module API) interface for WebHDFS REST API calls.
Writing to the remote file:
var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();
var localFileStream = fs.createReadStream('/path/to/local/file');
var remoteFileStream = hdfs.createWriteStream('/path/to/remote/file');
localFileStream.pipe(remoteFileStream);
remoteFileStream.on('error', function onError (err) {
// Do something with the error
});
remoteFileStream.on('finish', function onFinish () {
// Upload is done
});
Reading from the remote file:
var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();
var remoteFileStream = hdfs.createReadStream('/path/to/remote/file');
remoteFileStream.on('error', function onError (err) {
// Do something with the error
});
remoteFileStream.on('data', function onChunk (chunk) {
// Do something with the data chunk
});
remoteFileStream.on('finish', function onFinish () {
// Upload is done
});
Not good news!!!
Do not use node-hdfs. Although it seems promising, it is now two years obsolete. I've tried to compile it but it does not match the symbols of current libhdfs. If you want to use something like that you'll have to make your own nodejs binding.
You can use node-webhdfs but IMHO there's not much advantage on that. It is better to use an http nodejs lib to make your own requests. The hardest part here is try to hold the very async nature of nodejs, since you might want first to create a folder, and then after successfully create it, create a file and then, at last, write or append data. Everything through http requests that you must send and wait the for answer to then go on....
At least node-webhdfs might be a good reference to you take a look and start your own code.
Br, Fabio Moreira
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With