Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode and Javascript: Invalid byte sequences

Some byte sequences are apparently invalid in Unicode encodings, and I know that some languages (Python for one) throw an error when that happens.

My question is: what happens in Javascript when receiving such a sequence during an XMLHttpRequest or XDomainRequest? Does the resulting string:

  1. Get truncated when that happens?
  2. Skip the bad sequence and start at the next byte(s)?
  3. Continue decoding and only show the replacement � character when displayed?

If 3, then does the charCodeAt function return a valid character code?

like image 316
F.X. Avatar asked Oct 06 '22 10:10

F.X.


1 Answers

Number 3 happens. It shows the � when displayed, and charCodeAt returns 0xFFFD, the � unicode character.

like image 176
saml Avatar answered Oct 09 '22 01:10

saml