Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String to byte array in UTF-8?

How to convert a WideString (or other long string) to byte array in UTF-8?

like image 750
Mariusz Avatar asked Mar 08 '11 14:03

Mariusz


People also ask

How do I encode an array of strings?

To encode string array values, use the numpy. char. encode() method in Python Numpy. The arr is the input array to be encoded.

Are Java strings UTF-8?

String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data.

How do I print a byte array?

You can simply iterate the byte array and print the byte using System. out. println() method.


2 Answers

A function like this will do what you need:

function UTF8Bytes(const s: UTF8String): TBytes;
begin
  Assert(StringElementSize(s)=1);
  SetLength(Result, Length(s));
  if Length(Result)>0 then
    Move(s[1], Result[0], Length(s));
end;

You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work.

After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string.

If you want to get the zero-terminator you would write it so:

function UTF8Bytes(const s: UTF8String): TBytes;
begin
  Assert(StringElementSize(s)=1);
  SetLength(Result, Length(s)+1);
  if Length(Result)>0 then
    Move(s[1], Result[0], Length(s));
  Result[high(Result)] := 0;
end;
like image 151
David Heffernan Avatar answered Sep 30 '22 16:09

David Heffernan


You can use TEncoding.UTF8.GetBytes in SysUtils.pas

like image 34
Mikael Eriksson Avatar answered Sep 30 '22 16:09

Mikael Eriksson