I'm attempting to display the character í
from 0xed
(237).
String.fromCharCode
yields the correct result:
String.fromCharCode(0xed); // 'í'
However, when using a Buffer
:
var buf = new Buffer(1);
buf.writeUInt8(0xed,0); // <Buffer ed>
buf.toString('utf8'); // '?', same as buf.toString()
buf.toString('binary'); // 'í'
Using 'binary' with Buffer.toString
is to be deprecated so I want to avoid this.
Second, I can also expect incoming data to be multibyte (i.e. UTF-8), e.g.:
String.fromCharCode(0x0512); // Ԓ - correct
var buf = new Buffer(2);
buf.writeUInt16LE(0x0512,0); // <Buffer 12 05>, [0x0512 & 0xff, 0x0512 >> 8]
buf.toString('utf8'); // Ԓ - correct
buf.toString('binary'); // Ô
Note that both examples are inconsistent.
SO, what am I missing? What am I assuming that I shouldn't? Is String.fromCharCode
magical?
Seems you might be assuming that String
s and Buffer
s use the same bit-length and encoding.
JavaScript String
s are 16-bit, UTF-16 sequences while Node's Buffer
s are 8-bit sequences.
UTF-8 is also a variable byte-length encoding, with code points consuming between 1 and 6 bytes. The UTF-8 encoding of í
, for example, takes 2 bytes:
> new Buffer('í', 'utf8')
<Buffer c3 ad>
And, on its own, 0xed
is not a valid byte in UTF-8 encoding, thus the ?
representing an "unknown character." It is, however, a valid UTF-16 code for use with String.fromCharCode()
.
Also, the output you suggest for the 2nd example doesn't seem correct.
var buf = new Buffer(2);
buf.writeUInt16LE(0x0512, 0);
console.log(buf.toString('utf8')); // "\u0012\u0005"
You can detour with String.fromCharCode()
to see the UTF-8 encoding.
var buf = new Buffer(String.fromCharCode(0x0512), 'utf8');
console.log(buf); // <Buffer d4 92>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With