Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

node.js: how to HTTP-get and decode/encode response in custom format

This is the page which states which encodings are supported in node.js: here or here. Many of popular (or popular some time ago) encodings are missing, such as windows-1252.

I want to fetch a webpage that is in windows-1252 and parse the response, finally, save it into a file. I'm having trouble with encoding. I've done lots of different tries and my mind is blowing up :(

So I know there are iconv and iconv-lite modules in node.js that support more encodings than node.js does. I'd like to use iconv-lite, since I'm unable to compile stuff required for iconv on my company machine. Anyway, I've got

var iconv = require('iconv-lite');

Now, the difficult part - fetching the response. As I wrote, my resource lies somewhere in the web, so I need to fire a HTTP request. I've been trying node-wget (npm:wget module), http.request, http.get and all of those tries failed.

I also googled and the closest solution to what I need seems to be nodejs encoding using request / https://stackoverflow.com/a/22027928/769384, but the author didn't write what the hell is request there - is it a node module? How does he loads it?

I've also read https://groups.google.com/forum/#!topic/nodejs/smA6-jGq2pw, but found no clean solution there.

I'd appreciate a minimal set of code that enabled me to fetch a web document and convert it on the fly from windows-1252 encoding to UTF-8. The only parameter is the URL of the document.

like image 317
ducin Avatar asked Feb 09 '23 09:02

ducin


1 Answers

Here's an example using iconv-lite and http (I didn't add any error handling, but it's just to give an idea on how to implement something like this):

var http  = require('http');
var iconv = require('iconv-lite');

function retrieve(url, callback) {
  http.get(url, function(res) {
    var chunks = [];

    // Collect all the response chunks.
    res.on('data', function(chunk) {
      chunks.push(chunk);
    });

    // The response has been fully read here.
    res.on('end', function() {
      // Collect all the chunks into one buffer.
      var buffer = Buffer.concat(chunks);

      // Convert to a (UTF-8-encoded) string.
      var str = iconv.decode(buffer, 'windows-1252');

      // Call the callback with the string.
      return callback(null, str);
    });
  });
}

// To use:
retrieve(YOUR_URL, function(err, html) {
  console.log(html);
});

EDIT: just noticed that iconv-lite supports streams too. Here's a much smaller version of the retrieve() function:

function retrieve(url, callback) {
  http.get(url, function(res) {
    res.pipe(iconv.decodeStream('win1252')).collect(callback);
  });
}
like image 54
robertklep Avatar answered Feb 12 '23 00:02

robertklep