Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caveats Encoding a C# string to a Javascript string

I'm trying to write a custom Javascript MVC3 Helper class foe my project, and one of the methods is supposed to escape C# strings to Javascript strings.

I know C# strings are UTF-16 encoded, and Javascript strings also seem to be UTF-16. No problem here.

I know some characters like backslash, single quotes or double quotes must be backslash-escaped on Javascript so:

\ becomes \\
' becomes \'
" becomes \"

Is there any other caveat I must be aware of before writing my conversion method ?

EDIT: Great answers so far, I'm adding some references from the answers in the question to help others in the future.

Alex K. suggested using System.Web.HttpUtility.JavaScriptStringEncode, which I marked as the right answer for me, because I'm using .Net 4. But this function is not available to previous .Net versions, so I'm adding some other resources here:

CR  becomes \r   // Javascript string cannot be broke into more than 1 line
LF  becomes \n   // Javascript string cannot be broke into more than 1 line
TAB becomes \t

Control characters must be Hex-Escaped

JP Richardson gave an interesting link informing that Javascript uses UCS-2, which is a subset of UTF-16, but how to encode this correctly is an entirely new question.

LukeH on the comments below reminded the CR, LF and TAB chars, and that reminded me of the control chars (BEEP, NULL, ACK, etc...).

like image 625
Machado Avatar asked Feb 23 '12 12:02

Machado


2 Answers

(.net 4) You can;

System.Web.HttpUtility.JavaScriptStringEncode(@"aa\bb ""cc"" dd\tee", true);
== 
"aa\\bb \"cc\" dd\\tee"
like image 171
Alex K. Avatar answered Nov 01 '22 09:11

Alex K.


It's my understanding that you do have to be careful, as JavaScript is not UTF-16, rather, it's UCS-2 which I believe is a subset of UTF-16. What this means for you, is that any character that is represented than a higher code point of 2 bytes (0xFFFF) could give you problems in JavaScript.

In summary, under the covers, the engine may use UTF-16, but it only exposes UCS-2 like methods.

Great article on the issue: http://mathiasbynens.be/notes/javascript-encoding

like image 34
JP Richardson Avatar answered Nov 01 '22 11:11

JP Richardson