Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell IO with non English characters

Look at this , i am try

appendFile "out" $ show 'д'

'д' is character from Russian alphabet. After that "out" file contains:

'\1076'

How i understand is the unicode numeric code of character 'д'. Why is it happens ? And How i can to get the normal representation of my character ?

For additional information it is works good:

appendFile "out"  "д"

Thanks.

like image 784
Anton Avatar asked Aug 31 '10 17:08

Anton


3 Answers

show escapes all characters outside the ASCII range (and some inside the ASCII range), so don't use show.

Since "д" works fine, just use that. If you can't because the д is actually inside a variable, you can use [c] (where c is the variable containing the character. If you need to surround it by single quotes (like show does), you can use ['\'', c, '\''].

like image 128
sepp2k Avatar answered Sep 23 '22 08:09

sepp2k


After reading your reply to my comment, I think your situation is that you have some data structure, maybe with type [(String,String)], and you'd like to output it for debugging purposes. Using show would be convienent, but it escapes non-ASCII characters.

The problem here isn't with the unicode, you need a function that will properly format your data for display. I don't think show is the right choice, in part because of the problems with escaping some characters. What you need is a type class like Show, but one that displays data for reading instead of escaping characters. That is, you need a pretty-printer, which is a library that provides functions to format data for display. There are several pretty-printers available on Hackage, I'd look at uulib or wl-pprint to start. I think either would be suitable without too much work.

Here's an example with the uulib tools. The Pretty type class is used instead of Show, the library comes with many useful instances.

import UU.PPrint

-- | Write each item to StdOut
logger :: Pretty a => a -> IO ()
logger x = putDoc $ pretty x <+> line

running this in ghci:

Prelude UU.PPrint> logger 'Д'
Д 
Prelude UU.PPrint> logger ('Д', "other text", 54)
(Д,other text,54) 
Prelude UU.PPrint> 

If you want to output to a file instead of the console, you can use the hPutDoc function to output to a handle. You could also call renderSimple to produce a SimpleDoc, then pattern match on the constructors to process output, but that's probably more trouble. Whatever you do, avoid show:

Prelude UU.PPrint> show $ pretty 'Д'
"\1044"

You could also write your own type class similar to show but formatted as you like it. The Text.Printf module can be helpful if you go this route.

like image 27
John L Avatar answered Sep 20 '22 08:09

John L


Use Data.Text. It provides IO with locale-awareness and encoding support.

like image 31
Don Stewart Avatar answered Sep 22 '22 08:09

Don Stewart