Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

System.Directory.getDirectoryContents unicode support

The following code prints something like °Ð½Ð´Ð¸Ñ-ÐÑпаниÑ

getDirectoryContents "path/to/directory/that/contains/files/with/nonASCII/names"
  >>= mapM_ putStrLn

Looks like it is a ghc bug and it is fixed already in repository. But what to do until everybody upgrade ghc?

The last time I encountered such the problem (it was few years ago, btw), I used utf8-string package to convert strings, but I don't remember how I did it, and ghc unicode support was changed visibly last years.

So, what is the best (or at least working) way to get directory contents with full unicode support?

ghc version 7.0.4 locale en_US.UTF-8

like image 350
Yuras Avatar asked Feb 24 '23 12:02

Yuras


1 Answers

Here's a simple workaround using decodeString and encodeString from utf8-string.

import System.Directory
import qualified Codec.Binary.UTF8.String as UTF8

main = do
   getDirectoryContents "." >>= mapM_ (putStrLn . UTF8.decodeString)
   putStrLn "------------"
   readFile (UTF8.encodeString "brøken-file-nåme.txt") >>= putStrLn

Output:

.
..
brøken-file-nåme.txt
Broken.hs
------------
hello
like image 84
hammar Avatar answered Feb 26 '23 03:02

hammar