Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a string from utf8 to latin1 in NodeJS

I'm using a Latin1 encoded DB and can't change it to UTF-8 meaning that I run into issues with certain application data. I'm using Tesseract to OCR a document (tesseract encodes in UTF-8) and tried to use iconv-lite; however, it creates a buffer and to convert that buffer into a string. But again, buffer to string conversion does not allow "latin1" encoding.

I've read a bunch of questions/answers; however, all I get is setting client encoding and stuff like that.

Any ideas?

like image 632
antjanus Avatar asked Feb 18 '15 21:02

antjanus


2 Answers

Since Node.js v7.1.0, you can use the transcode function from the buffer module:
https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc

For example:

const buffer = require('buffer');
const latin1Buffer = buffer.transcode(Buffer.from(utf8String), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");
like image 138
CedX Avatar answered Oct 22 '22 14:10

CedX


You can create a buffer from the UFT8 string you have, and then decode that buffer to Latin 1 using iconv-lite, like this

var buff   = new Buffer(tesseract_string, 'utf8');
var DB_str = iconv.decode(buff, 'ISO-8859-1');
like image 3
adeneo Avatar answered Oct 22 '22 12:10

adeneo