Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

\r\n translated to \r\r\n in Haskell

I'm on Windows 7 64-bit.

My program needs to retrieve some text (Utf8 encoded) from an external source, do some things with it, then save it to disk. The original text is using the "\r\n" sequence to represent newlines (I am happy to keep it that way).

The issue : When using Data.Text.writeFile each "\r\n" sequence seems to be translated as "\r\r\n", that is every '\n' is translated to "\r\n", even when it is already preceded by '\r' in the original text. I understand that, when writing to a file on Windows OS, '\n' should be translated to a "\r\n", when not already preceded by '\r' , but the translation of "\r\n" to "\r\r\n" does not seem right.

Using ByteString.writeLine applied to the encodeUtf8 version of the text works well though (no extra "\r" inserted inside a "\r\n" sequence)

A simple example :

{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString as B
import qualified Data.Text as T
import qualified Data.Text.IO as T (writeFile)
import qualified Data.Text.Encoding as T (encodeUtf8)

str = "Line 1 is here\r\nLine 2 is here\r\nLine 3 is here" :: T.Text

main = do
    B.writeFile "byt.bin" $ T.encodeUtf8 str
    T.writeFile "txt.bin" str

Looking at each file produced by this code with an hex editor, one can see the extra x0D added in front of each x0A in the file produced via the T.writeFile line.

B.writeFile : enter image description here

T.writeFile : enter image description here

My question : What did I do wrong? Is there a way to use T.writeFile on Windows, and not get "\r\n" translated to "\r\r\n"?

like image 714
Janthelme Avatar asked Jun 23 '15 08:06

Janthelme


1 Answers

Your answer is in the docs:

Beginning with GHC 6.12, text I/O is performed using the system or handle's current locale and line ending conventions.

Seeing as you do not open the handle yourself, it seems very likely that the library opens the file in text mode, leading to the translation of endline characters by the operating system. What you could do instead is open the file in binary mode using openBinaryFile and then use Data.Text.hPutStr to prevent this.

However, the OS handling your encoding might also not be what you want. Depending on your scenario, encoding/decoding the string explicitly like you do using ByteStrings might be the better idea.

like image 166
Niklas B. Avatar answered Sep 29 '22 02:09

Niklas B.