Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading file with "US-ASCII" encoding in Haskell: hGetContents: invalid argument (invalid byte sequence)

I'm using Haskell for programming a parser, but this error is a wall I can't pass. Here is my code:

main = do
  arguments    <- getArgs
  let fileName = head arguments
  fileContents <- readFile fileName
  converter    <- open "UTF-8" Nothing
  let titleLength           = length fileName
      titleWithoutExtension = take (titleLength - 4) fileName
      allNonEmptyLines      = unlines $ tail $ filter (/= "") $ lines fileContents

When I try to read a file with "US-ASCII" encoding I get the famous error hGetContents: invalid argument (invalid byte sequence). I've tried to change the "UTF-8" in my code by "US-ASCII", but the error persist. Is there a way for reading this files, or any kind of file handling encoding problems?

like image 361
freinn Avatar asked Nov 06 '15 20:11

freinn


1 Answers

You should hSetEncoding to configure the file handle for a specific text encoding, e.g.:

import System.Environment
import System.IO

main = do
  (path : _) <- getArgs
  h <- openFile path ReadMode
  hSetEncoding h latin1
  contents <- hGetContents h
  -- no need to close h
  putStrLn $ show $ length contents

If your file contains non-ASCII characters and it's not UTF8 encoded, then latin1 is a good bet although it's not the only possibility.

like image 170
ErikR Avatar answered Nov 04 '22 17:11

ErikR