Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I read a file encoded in utf-16 in nodejs?

Tags:

node.js

utf-16

I have to read a file encoded in UTF-16 using nodejs (in chunks because it is very large). The data from the file will go into a mongodb, so I will need to convert it into utf-8. From googling, it seems that this is just plain not supported by Node, and I will have to resort to converting the raw data from a buffer myself. But I also think there ought to be a better way and I'm just not finding it. Any suggestions?

Thanks.

like image 425
Ryan Ballantyne Avatar asked Jun 07 '12 21:06

Ryan Ballantyne


People also ask

How do I know if my file is UTF-16 or UTF-8?

There are a few options you can use: check the content-type to see if it includes a charset parameter which would indicate the encoding (e.g. Content-Type: text/plain; charset=utf-16 ); check if the uploaded data has a BOM (the first few bytes in the file, which would map to the unicode character U+FEFF - 2 bytes for ...

Is UTF-8 and UTF-16 the same?

Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits. Main UTF-8 pros: Basic ASCII characters like digits, Latin characters with no accents, etc.

How do I read a node js file?

Node.js as a File Server To include the File System module, use the require() method: var fs = require('fs'); Common use for the File System module: Read files.


1 Answers

Replace the normal utf8 you'd have when reading a text file with utf16le or ucs2:

var fileContents = fs.readFileSync('import.csv','utf16le') 

or:

var fileContents = fs.readFileSync('import.csv','ucs2') 

Also, for anyone searching the internet: anyone getting additional � (question mark) characters appearing in a parsed file, this is probably the cause of your problem. Read the file as UTF16/UCS2 and the extra characters will disappear.

like image 151
mikemaccana Avatar answered Sep 30 '22 18:09

mikemaccana