Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read file with UTF-8 in Haskell as IO String

Tags:

haskell

utf-8

I have the following code which works fine unless the file has utf-8 characteres :

module Main where
import Ref
main = do
    text <- getLine
    theInput <- readFile text
    writeFile ("a"++text) (unlist . proc . lines $ theInput)

With utf-8 characteres I get this: hGetContents: invalid argument (invalid byte sequence)

Since the file I'm working with has UTF-8 characters, I would like to handle this exception in order to reuse the functions imported from Ref if possible.

Is there a way to read a UTF-8 file as IO String so I can reuse my Ref's functions?. What modifications should I make to my code?. Thanks in Advance.

I attach the functions declarations from my Ref module:

unlist :: [String] -> String
proc :: [String] -> [String]

from prelude:

lines :: String -> [String]
like image 210
George Peppa Avatar asked Oct 30 '15 20:10

George Peppa


3 Answers

This can be done with just GHC's basic (but extended from the standard) System.IO module, although you'll then have to use more functions:

module Main where

import Ref
import System.IO

main = do
    text <- getLine
    inputHandle <- openFile text ReadMode 
    hSetEncoding inputHandle utf8
    theInput <- hGetContents inputHandle
    outputHandle <- openFile ("a"++text) WriteMode
    hSetEncoding outputHandle utf8
    hPutStr outputHandle (unlist . proc . lines $ theInput)
    hClose outputHandle -- I guess this one is optional in this case.
like image 189
Ørjan Johansen Avatar answered Oct 23 '22 04:10

Ørjan Johansen


Thanks for the answers, but I found the solution by myself. Actually the file I was working with has this codification:

ISO-8859 text, with CR line terminators

So to work with that file with my haskell code It should have this codification instead:

UTF-8 Unicode text, with CR line terminators

You can check the file codification with the utility file like this:

$ file filename

To change the file codification follow the instructions from this link!

like image 37
George Peppa Avatar answered Oct 23 '22 04:10

George Peppa


Use System.IO.Encoding.

The lack of unicode support is a well known problem with with the standard Haskell IO library.

module Main where

import Prelude hiding (readFile, getLine, writeFile)
import System.IO.Encoding
import Data.Encoding.UTF8

main = do
    let ?enc = UTF8
    text <- getLine
    theInput <- readFile text
    writeFile ("a" ++ text) (unlist . proc . lines $ theInput)
like image 3
jazmit Avatar answered Oct 23 '22 04:10

jazmit