Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get UTF-8 html content with Node's http.get

I'm trying to pull the html content of a given url and the origin content encoding is utf-8. I get the html of the page but the text whitin the html elemnts are returned in bad format (question marks).

This is what I do:

var parsedPath = url.parse(path);
var options = {
    host: parsedPath.host,
    path: parsedPath.path,
    headers: {
        'Accept-Charset' : 'utf-8',
    }
}

http.get(options, function (res) {
    var data = "";
    res.on('data', function (chunk) {
        data += chunk;
    });
    res.on("end", function () {
        console.log(data);
    });
}).on("error", function () {
    callback(null);
});

How can I enforce the encoding of the returned data?

Thanks

like image 455
Ben Diamant Avatar asked Jan 17 '15 11:01

Ben Diamant


1 Answers

Use the setEncoding() method like this:

http.get(options, function (res) {
    res.setEncoding('utf8');

    var data = "";
    res.on('data', function (chunk) {
        data += chunk;
    });
    res.on("end", function () {
        console.log(data);
    });
});
like image 173
alexpods Avatar answered Oct 13 '22 23:10

alexpods