Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node.js buf.toString vs String.fromCharCode

I'm attempting to display the character í from 0xed (237).

String.fromCharCode yields the correct result:

String.fromCharCode(0xed); // 'í'

However, when using a Buffer:

var buf = new Buffer(1);
buf.writeUInt8(0xed,0); // <Buffer ed>
buf.toString('utf8'); // '?', same as buf.toString()
buf.toString('binary'); // 'í'

Using 'binary' with Buffer.toString is to be deprecated so I want to avoid this.

Second, I can also expect incoming data to be multibyte (i.e. UTF-8), e.g.:

String.fromCharCode(0x0512); // Ԓ - correct
var buf = new Buffer(2);
buf.writeUInt16LE(0x0512,0); // <Buffer 12 05>, [0x0512 & 0xff, 0x0512 >> 8]
buf.toString('utf8'); // Ԓ - correct
buf.toString('binary'); // Ô

Note that both examples are inconsistent.

SO, what am I missing? What am I assuming that I shouldn't? Is String.fromCharCode magical?

like image 332
zamnuts Avatar asked Aug 22 '13 01:08

zamnuts


1 Answers

Seems you might be assuming that Strings and Buffers use the same bit-length and encoding.

JavaScript Strings are 16-bit, UTF-16 sequences while Node's Buffers are 8-bit sequences.

UTF-8 is also a variable byte-length encoding, with code points consuming between 1 and 6 bytes. The UTF-8 encoding of í, for example, takes 2 bytes:

> new Buffer('í', 'utf8')
<Buffer c3 ad>

And, on its own, 0xed is not a valid byte in UTF-8 encoding, thus the ? representing an "unknown character." It is, however, a valid UTF-16 code for use with String.fromCharCode().

Also, the output you suggest for the 2nd example doesn't seem correct.

var buf = new Buffer(2);
buf.writeUInt16LE(0x0512, 0);
console.log(buf.toString('utf8')); // "\u0012\u0005"

You can detour with String.fromCharCode() to see the UTF-8 encoding.

var buf = new Buffer(String.fromCharCode(0x0512), 'utf8');
console.log(buf); // <Buffer d4 92>
like image 148
Jonathan Lonowski Avatar answered Sep 20 '22 22:09

Jonathan Lonowski