I'm on Windows 7 64-bit.
My program needs to retrieve some text (Utf8 encoded) from an external source, do some things with it, then save it to disk. The original text is using the "\r\n" sequence to represent newlines (I am happy to keep it that way).
The issue : When using Data.Text.writeFile each "\r\n" sequence seems to be translated as "\r\r\n", that is every '\n' is translated to "\r\n", even when it is already preceded by '\r' in the original text. I understand that, when writing to a file on Windows OS, '\n' should be translated to a "\r\n", when not already preceded by '\r' , but the translation of "\r\n" to "\r\r\n" does not seem right.
Using ByteString.writeLine applied to the encodeUtf8 version of the text works well though (no extra "\r" inserted inside a "\r\n" sequence)
A simple example :
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString as B
import qualified Data.Text as T
import qualified Data.Text.IO as T (writeFile)
import qualified Data.Text.Encoding as T (encodeUtf8)
str = "Line 1 is here\r\nLine 2 is here\r\nLine 3 is here" :: T.Text
main = do
B.writeFile "byt.bin" $ T.encodeUtf8 str
T.writeFile "txt.bin" str
Looking at each file produced by this code with an hex editor, one can see the extra x0D added in front of each x0A in the file produced via the T.writeFile line.
B.writeFile :
T.writeFile :
My question : What did I do wrong? Is there a way to use T.writeFile on Windows, and not get "\r\n" translated to "\r\r\n"?
Your answer is in the docs:
Beginning with GHC 6.12, text I/O is performed using the system or handle's current locale and line ending conventions.
Seeing as you do not open the handle yourself, it seems very likely that the library opens the file in text mode, leading to the translation of endline characters by the operating system. What you could do instead is open the file in binary mode using openBinaryFile
and then use Data.Text.hPutStr
to prevent this.
However, the OS handling your encoding might also not be what you want. Depending on your scenario, encoding/decoding the string explicitly like you do using ByteString
s might be the better idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With