When using the scala interpreter (i.e. running the command 'scala' on the commandline), I am not able to print unicode characters correctly. Of course a-z, A-Z, etc. are printed correctly, but for example € or ƒ is printed as a ?.
print(8364.toChar)
results in ? instead of €. Probably I'm doing something wrong. My terminal supports utf-8 characters and even when I pipe the output to a seperate file and open it in a texteditor, ? is displayed.
This is all happening on Mac OS X (Snow Leopard, 10.6.2) with Scala 2.8 (nightly build) and Java 1.6.0_17)
I found the cause of the problem, and a solution to make it work as it should.
As I already suspected after posting my question and reading the answer of Calum and issues with encoding on the Mac with another project (which was in Java), the cause of the problem is the default encoding used by Mac OS X. When you start scala
interpreter, it will use the default encoding for the specified platform. On Mac OS X, this is Macroman, on Windows it is probably CP1252. You can check this by typing the following command in the scala interpreter:
scala> System.getProperty("file.encoding");
res3: java.lang.String = MacRoman
According to the scala
help test, it is possible to provide Java properties using the -D option. However, this does not work for me. I ended up setting the environment variable
JAVA_OPTS="-Dfile.encoding=UTF-8"
After running scala
, the result of the previous command will give the following result:
scala> System.getProperty("file.encoding")
res0: java.lang.String = UTF-8
Now, printing special characters works as expected:
print(0x20AC.toChar)
€
So, it is not a bug in Scala, but an issue with default encodings. In my opinion, it would be better if by default UTF-8 was used on all platforms. In my search for an answer if this is considered, I came across a discussion on the Scala mailing list on this issue. In the first message, it is proposes to use UTF-8 by default on Mac OS X when file.encoding
reports Macroman, since UTF-8 is the default charset on Mac OS X (keeps me wondering why file.encoding
by defaults is set to Macroman, probably this is an inheritance from Mac OS before 10 was released?). I don't think this proposal will be part of Scala 2.8, since Martin Odersky wrote that it is probably best to keep things as they are in Java (i.e. honor the file.encoding
property).
Ok, at least part, if not all, of your problem here is that 128 is not the Unicode codepoint for Euro. 128 (or 0x80 since hex seems to be the norm) is U+0080 <control>
, i.e. it is not a printable character, so it's not surprising your terminal is having trouble printing it.
Euro's codepoint is 0x20AC (or in decimal 8364), and that appears to work for me (I'm on Linux, on a nightly of 2.8):
scala> print(0x20AC.toChar)
€
Another fun test is to print the Unicode snowman character:
scala> print(0x2603.toChar)
☃
128 as € is apparently an extended character from one of the Windows code pages.
I got the other character you mentioned to work too:
scala> 'ƒ'.toInt
res8: Int = 402
scala> 402.toChar
res9: Char = ƒ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With