This is the page which states which encodings are supported in node.js: here or here. Many of popular (or popular some time ago) encodings are missing, such as windows-1252.
I want to fetch a webpage that is in windows-1252 and parse the response, finally, save it into a file. I'm having trouble with encoding. I've done lots of different tries and my mind is blowing up :(
So I know there are iconv
and iconv-lite
modules in node.js that support more encodings than node.js does. I'd like to use iconv-lite
, since I'm unable to compile stuff required for iconv
on my company machine. Anyway, I've got
var iconv = require('iconv-lite');
Now, the difficult part - fetching the response. As I wrote, my resource lies somewhere in the web, so I need to fire a HTTP request. I've been trying node-wget (npm:wget
module), http.request
, http.get
and all of those tries failed.
I also googled and the closest solution to what I need seems to be nodejs encoding using request / https://stackoverflow.com/a/22027928/769384, but the author didn't write what the hell is request
there - is it a node module? How does he loads it?
I've also read https://groups.google.com/forum/#!topic/nodejs/smA6-jGq2pw, but found no clean solution there.
I'd appreciate a minimal set of code that enabled me to fetch a web document and convert it on the fly from windows-1252
encoding to UTF-8
. The only parameter is the URL of the document.
Here's an example using iconv-lite
and http
(I didn't add any error handling, but it's just to give an idea on how to implement something like this):
var http = require('http');
var iconv = require('iconv-lite');
function retrieve(url, callback) {
http.get(url, function(res) {
var chunks = [];
// Collect all the response chunks.
res.on('data', function(chunk) {
chunks.push(chunk);
});
// The response has been fully read here.
res.on('end', function() {
// Collect all the chunks into one buffer.
var buffer = Buffer.concat(chunks);
// Convert to a (UTF-8-encoded) string.
var str = iconv.decode(buffer, 'windows-1252');
// Call the callback with the string.
return callback(null, str);
});
});
}
// To use:
retrieve(YOUR_URL, function(err, html) {
console.log(html);
});
EDIT: just noticed that iconv-lite
supports streams too. Here's a much smaller version of the retrieve()
function:
function retrieve(url, callback) {
http.get(url, function(res) {
res.pipe(iconv.decodeStream('win1252')).collect(callback);
});
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With