Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode in Rhino

For some reason Unicode strings don't behave properly in Rhino, Mozilla's JavaScript engine. If I enter Unicode text in the REPL, or manipulate it, it returns back gibberish.

js> 'тотальная киборгизация'
B>B0;L=0O :81>@3870F8O

ASCII characters work just fine.

js> 'reprap for everyone'
reprap for everyone

Unix commands work fine too:

$ echo 'тотальная киборгизация'
тотальная киборгизация

JVM output is fine too, running class Test { public static void main(String[] args) { System.out.println("тотальная киборгизация"); } } outputs Cyrillic correctly.

Java and Rhino versions are:

$ java -version
java version "1.7.0_09"
OpenJDK Runtime Environment (IcedTea7 2.3.3) (7u9-2.3.3-0ubuntu1~12.10.1)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
$ rhino
Rhino 1.7 release 3 2012 05 18

Locales:

$ echo $LC_TYPE

$ echo $LANG
en_US.UTF-8

Changing LC_ALL to en_US.UTF-8 doesn't help.

Does this problem have to do with this StackOverflow question, Javascript using UCS-2?

What's the problem, and how can I use proper Unicode in Rhino REPL?

like image 387
Mirzhan Irkegulov Avatar asked Dec 13 '12 14:12

Mirzhan Irkegulov


1 Answers

It really should be noted that JavaScript doesn't really handle Unicode properly since it predates UTF16. (It does use another 16 bit encoding system which is similar, but certainly not the same.)

This writeup explains the problem in well and provides libraries and workarounds .

like image 137
Jeremy J Starcher Avatar answered Oct 08 '22 13:10

Jeremy J Starcher