Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LoadFromFile with Unicode data

My input file(f) has some Unicode (Swedish) that isn't being read correctly.

Neither of these approaches works, although they give different results:

  LoadFromFile(f);

or

  LoadFromFile(f,TEncoding.GetEncoding(GetOEMCP));

I'm using Delphi XE

How can I LoadFromFile some Unicode data....also how do I subsequently SaveToFile? Thanks

like image 987
bobonwhidbey Avatar asked May 12 '12 16:05

bobonwhidbey


2 Answers

In order to load a Unicode text file you need to know its encoding. If the file has a Byte Order Mark (BOM), then you can simply call LoadFromFile(FileName) and the RTL will use the BOM to determine the encoding.

If the file does not have a BOM then you need to explicitly specify the encoding, e.g.

LoadFromFile(FileName, TEncoding.UTF8);
LoadFromFile(FileName, TEncoding.Unicode);//UTF-16 LE
LoadFromFile(FileName, TEncoding.BigEndianUnicode);//UTF-16 BE

For some reason, unknown to me, there is no built in support for UTF-32, but if you had such a file then it would be easy enough to add a TEncoding instance to handle that.

like image 149
David Heffernan Avatar answered Sep 30 '22 10:09

David Heffernan


I assume that you mean 'UTF-8' when you say 'Unicode'.

If you know that the file is UTF-8, then do

LoadFromFile(f, TEncoding.UTF8).

To save:

SaveToFile(f, TEncoding.UTF8);

(The GetOEMCP WinAPI function is for old 255-character character sets.)

like image 29
Andreas Rejbrand Avatar answered Sep 30 '22 11:09

Andreas Rejbrand