Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

haskell: read in special characters from console

I'd like to read in a string from console which contains special characters like ö,ä,ü,µ... I've tried:

do ... ts <- getLine ...

but this doesn't work for those character. For example, unicode for ö is \246, but if I use getLine to read in ö haskell reads in "\195\182", and putStr "\195\182" gives me ö, which is not ö. What's the problem here? Do I need another function to read in those characters?

I am using WinGHCi 7.0.3 on windows xp. I'd be glad if someone could help me because I didn't find anything so far.


@Judah Jacobson:

I tried it again, before typing any other commands, and got this:

Prelude> :m +System.IO
Prelude System.IO> hSetEncoding stdin utf8
Prelude System.IO> getLine
ασδφ
"\206\177\207\402\206\180\207\8224"
Prelude System.IO> putStr "\206\177\207\402\206\180\207\8224"
ασδφPrelude System.IO> 

I also tried the windows command chcp 65001 but it didn't change anything, I had utf8 already activated in windows.

like image 564
Alex Avatar asked Sep 10 '11 13:09

Alex


3 Answers

Since GHC 6.12 strings are handled as UTF8 in input and output (or with some other encoding, based on your locale setting). So make sure your locale is set to e.g. UTF8.

You can also manually control this stuff via the text package, which supports many other locale conventions and encodings.

like image 189
Don Stewart Avatar answered Oct 21 '22 15:10

Don Stewart


You need to set the encoding of stdin to UTF8. For me, this is set to CP437 initially in GHCi on Windows XP, and to UTF8 on Mac.

Check with hGetEncoding stdin (System.IO), and set with hSetEncoding stdin utf8 and it should work.

Edit: This is what it looks like on my Mac:

Prelude System.IO> hSetEncoding stdin latin1
Prelude System.IO> str <- getLine
ö
Prelude System.IO> putStr str
öPrelude System.IO> print str
"\195\182"
Prelude System.IO> hSetEncoding stdin utf8
Prelude System.IO> str <- getLine
ö
Prelude System.IO> putStr str
öPrelude System.IO> print str
"\246"
like image 38
firefrorefiddle Avatar answered Oct 21 '22 15:10

firefrorefiddle


I was able to reproduce your error; this looks like a bug in WinGHCi. By default, GHC on Windows uses the Win32 "console code page" to encode and decode Handle I/O. However, WinGHCi sends input to GHC as UTF8-encoded bytes, but incorrectly has the code page set to 1252 (Latin-1).

I was able to work around this bug using Mike Hartl's answer: run hSetEncoding stdin utf8 before performing any line-input commands. For example:

Prelude> :m +System.IO
Prelude System.IO> hSetEncoding stdin utf8
Prelude System.IO> getLine
ασδφ
"\945\963\948\966"

If that doesn't work for you, please let us know what you get when you run the above commands.

Alternately, you will probably have better luck Unicode-wise with the "GHCi" program (which, admittedly, has a less nice GUI).

like image 1
Judah Jacobson Avatar answered Oct 21 '22 13:10

Judah Jacobson