Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly decode text files from FileSystemReadStream in Pharo 1.4

In Pharo 1.4 i opened a FileSystemReadStream on a text file and transformed it to a String via aFileSystemReadStream contents asString.

My text files are UTF8 encoded and have those Windows (CR LF) linebreaks.

The resulting Pharo Strings have two linebreaks per text file line and some weird characters instead of german umlauts like Ä, Ö, Ü etc.

How can i correctly decode my text files in Pharo?

like image 316
Helene Bilbo Avatar asked Jul 31 '12 11:07

Helene Bilbo


People also ask

What is the encoding of a text file?

An encoding converts a sequence of code points to a sequence of bytes. An encoding is typically used when writing text to a file. To read it back in we have to know how it was encoded and decode it back into memory. A text encoding is basically a file format for text files.


1 Answers

Don't use FileSystemReadStreams in 1.4, they are not complete and buggy ;). Use FileStream instead.

multiByteFileStream := FileStream fileNamed: '/foo/bar.txt'.
multiByteFileStream contents.

It will return a MultiByteFileStream where you can set the line end convention and encoding:

multiByteFileStream 
    "possible values are: #cr #lf #crlf"
    lineEndConvention: #cr;
    "set a specific converter, see subclasses of TextConverter"
    converter: UTF8TextConverter new. 
like image 153
camillobruni Avatar answered Oct 21 '22 19:10

camillobruni