Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

base64 encoding for utf-8 strings

i have rad studio xe5 i used indy EncodeString for encoding the input string...

my code is like this:

procedure TForm5.Button2Click(Sender: TObject);
var
  UTF8: UTF8String;
begin
UTF8 := UTF8Encode(m1.Text);
m2.Text := ind.EncodeString(UTF8);
end;

but the output is wrong for utf-8 inputs

orange  --> b3Jhbmdl  [correct]
book   --> Ym9vaw==   [correct]
سلام  -->  Pz8/Pw==   [wrong]
کتاب  --> Pz8/Pw==   [wrong]
دلفی  --> Pz8/Pw==   [wrong]

for utf-8 for all inputs it returned same out put!!! what is wrong with my code and how can i have a good result of base64 encoding with utf-8 strings

like image 952
peiman F. Avatar asked Mar 06 '14 01:03

peiman F.


People also ask

What is a UTF-8 encoded string?

UTF-8 encodes a character into a binary string of one, two, three, or four bytes. UTF-16 encodes a Unicode character into a string of either two or four bytes. This distinction is evident from their names. In UTF-8, the smallest binary representation of a character is one byte, or eight bits.

How do I change the encoding to UTF-8?

UTF-8 Encoding in Notepad (Windows)Click File in the top-left corner of your screen. In the dialog which appears, select the following options: In the "Save as type" drop-down, select All Files. In the "Encoding" drop-down, select UTF-8.


2 Answers

Like @RRUZ said, EncodeString() expects you to specify a byte encoding that the input String will be converted to, and then those octets will be encoded to base64.

You are passing a UTF8String to EncodeString(), which takes a UnicodeString as input in XE5, so the RTL will convert the UTF8String data back to UTF-16, undoing your UTF8Encode() (which is deprecated, BTW). Since you are not specifying a byte encoding, Indy uses its default encoding, which is set to ASCII by default (configurable via the GIdDefaultTextEncoding variable in the IdGlobal unit).

That is why orange works (no data loss) but سلام fails (data loss).

You need to get rid of your UTF8String altogether, and let Indy handle the UTF-8 for you:

procedure TForm5.Button2Click(Sender: TObject);
begin
  m2.Text := TIdEncoderMIME.EncodeString(m1.Text, IndyTextEncoding_UTF8);
end;

DecodeString() has a similar parameter for specifying the byte encoding of the octets that have been base64 encoded. The input is first decoded to bytes, and then the bytes are converted to UnicodeString using the specified byte encoding, eg:

procedure TForm5.Button3Click(Sender: TObject);
begin
  m1.Text := TIdDecoderMIME.DecodeString(m2.Text, IndyTextEncoding_UTF8);
end;
like image 166
Remy Lebeau Avatar answered Oct 18 '22 15:10

Remy Lebeau


You must call the EncodeString method passing a proper byte encoding class.

Try this

m2.Text := TIdEncoderMIME.EncodeString(UTF8, IndyUTF8Encoding);

(IndyUTF8Encoding is defined in the IdGlobalunit)

like image 38
RRUZ Avatar answered Oct 18 '22 16:10

RRUZ